ALPHA [was: lexer change]
David Relson
relson at osagesoftware.com
Mon Nov 10 22:15:04 CET 2003
On Mon, 10 Nov 2003 17:35:40 +0100
Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> wrote:
> Me again ...
>
> There are some things I don't understand in the lexer, maybe
> they also can be simplified:
>
> ALPHA is never used and is identical to A2.
>
> A2 is defined as [[:alpha:]][[:alnum:]]+ which is AFAICS the
> same as [[:alnum:]]+. Is that correct? Or does it mean: I
> alpha character followed by at least one alnum character?
They are different. Read flex documentation or create a test lexter and
test it.
There is an error. The trailing "+" does not belong in either A1 or A2.
Also, ALPHA is no longer needed.
A1 is needed for the places where a single letter needs to be identified
for use in a token and a2 is needed for a single letter followed by a
letter a digit. An example is a token split by an html comment, i.e.
"T<!xxx>ha<!xx>t".
I have corrected the problems and updated CVS.
> In either case: TOKEN_12 is the only place where A1 and A2
> are used. Since anything of the form A1 is also of the form
> A2, it would be sufficient to defined TOKEN_12 as
> ({TOKEN}|{A2}).
"a" is of form A1 but not A2.
> I don't understand what the use of TOKEN_12 is.
>
> What is BOGO_LEX doing?
BOGOLEX is used for msg-count format files, as the format was originally
called the bogolex format.
> Can someone explain those things to me, please?
pi,
Thanks for your close reading of the code. It has been very helpful in
spotting code that _looks_ ok (on casual inspection) but is actually
incorrect.
David
More information about the bogofilter-dev
mailing list