lexer changes

Wed Nov 12 01:22:06 CET 2003

On Tue, 11 Nov 2003 18:11:38 +0100
Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> wrote:

> Andras Salamon wrote:
> 
> >> -ENCODED_WORD	=\?{CHARSET}\?[bq]\?[^\?]*\?\=
> >> +ENCODED_WORD	=\?{CHARSET}\?[bq]\?[^?]*\?=
> > 
> > Personally I would prefer some of the "extra" backslashes to stay.
> > The ^\? may be equivalent to ^? in the above regex, but the second
> > version looks to me like the representation of ASCII DEL (0x7F). 
> > Confusing.
> > 
> > It's great to have this level of scrutiny of the lexer, but please
> > don't change stuff just to achieve some kind of "optimality". 
> > Mortals like me still need to read the code.
> 
> That's why I did it. I couldn't read the code with all those
> escaping for totally unclear (actually no) reasons.
> 

The danger in removing backslashes is that where they're needed is not
consistent.  Certain characters are special in some places, but not in
others.  For example "?" is special outside of square brackets, i.e. [],
but is not special inside them.  The minus sign, "-", is special inside
of square brackets and needs a backslash before it (unless it's the last
character, i.e. immediately before the right square bracket).  A left
square bracket inside of square brackets doesn't need to be escaped,
while a right square bracket _does_ need to be escaped.

I've chosen to leave certain unnecessary backslashes in bogofilter's
flex grammar.  Unless there's a _functional_ reason (like program speed
or size or a parsing error), they _shall_ remain.

End of subject.

David