ALPHA [was: lexer change]

David Relson relson at osagesoftware.com
Mon Nov 10 22:15:04 CET 2003


On Mon, 10 Nov 2003 17:35:40 +0100
Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> wrote:

> Me again ...
> 
> There are some things I don't understand in the lexer, maybe
> they also can be simplified:
> 
> ALPHA is never used and is identical to A2.
> 
> A2 is defined as [[:alpha:]][[:alnum:]]+ which is AFAICS the
> same as [[:alnum:]]+. Is that correct? Or does it mean: I
> alpha character followed by at least one alnum character?

They are different.  Read flex documentation or create a test lexter and
test it.


There is an error.  The trailing "+" does not belong in either A1 or A2.
 Also, ALPHA is no longer needed.

A1 is needed for the places where a single letter needs to be identified
for use in a token and a2 is needed for a single letter followed by a
letter a digit.  An example is a token split by an html comment, i.e.
"T<!xxx>ha<!xx>t".

I have corrected the problems and updated CVS.

> In either case: TOKEN_12 is the only place where A1 and A2
> are used. Since anything of the form A1 is also of the form
> A2, it would be sufficient to defined TOKEN_12 as
> ({TOKEN}|{A2}).

"a" is of form A1 but not A2.

> I don't understand what the use of TOKEN_12 is.
> 
> What is BOGO_LEX doing?

BOGOLEX is used for msg-count format files, as the format was originally
called the bogolex format.

> Can someone explain those things to me, please?

pi,

Thanks for your close reading of the code.  It has been very helpful in
spotting code that _looks_ ok (on casual inspection) but is actually
incorrect.

David





More information about the bogofilter-dev mailing list