Short tokens and numbers

Tue Nov 4 08:26:47 CET 2003

David Relson <relson at osagesoftware.com> wrote:

>> >>Maybe someone can explain why we use (my version):
>> >>TOKEN		{TOKENFRONT}{TOKENMID}{TOKENBACK}{0,70}
>> >>instead of
>> >>TOKEN		{TOKENFRONT}{TOKENMID}{0,70}{TOKENBACK}
>> >>where we had to modify TOKENMID (remove *) and TOKENBACK
>> >>(add  ?) appropriately.
>> 
>> Any answer for this?
>
>As I interpret the TOKENMID and TOKENBACK patterns, the first limits
>what's allowed as the first character while the second defines what's
>permitted in the middle and end positions.  Perhaps they should be named
>TOKEN_FIRST and TOKEN_REST (or TOKEN_HEAD and TOKEN_TAIL).

Obviously (ignoring the quantifiers) TOKENBACK is a proper
subset of TOKENMID (namenly not allowing ._-+ (the order is
changed, which is irritating but of course this is not
important). I don't know which tokens need to be escaped
with \. But: I just try to understand the rationale why this
way.

Let me try to describe what we do: We start with TOKENFRONT
(one character). Then any number of TOKENMID followed by up
to 70 characters of TOKENEND. If quantifiers are greedy in
that language, then we actually never use TOKENBACK (in the
standard version exactly one character here).

Further, we would allow a sequence of any lenght, didn't we
want to limit that?

So here is my idea what it should have been:

:TOKENFRONT	[^[:blank:][:cntrl:][:digit:][:punct:]]
:TOKENMID	[^[:blank:]<>;=():&%$#@+|/\\{}^\"\?\*,[:cntrl:]\[\]]
:TOKENBACK	[^[:blank:]<>;=():&%$#@+|/\\{}^\"\?\*\._\-\+,\[\][:cntrl:]]
: 
:TOKEN		{TOKENFRONT}{TOKENMID}{1,70}{TOKENBACK}

pi