Understanding lexer_v3.l changes

Boris 'pi' Piwinger 3.14 at piology.org
Sun Nov 26 16:47:35 CET 2006


Hi!

I just try to understand the recent changes in lexer_v3.l:

:< /* $Id: lexer_v3.l,v 1.162 2005/06/27 00:40:48 relson Exp $ */
:> /* $Id: lexer_v3.l,v 1.167 2006/07/04 03:47:37 relson Exp $ */

So this is 1.0.3 vs 1.1.1

:< ID       <?[[:alnum:]-]*>?
:> ID       <?[[:alnum:]\-\.]*>?

What is the new dot good for? CVS has "Cleanup queue-id
processing." as a comment. I am not sure what it relates to,
but the long comment in the beginning of lexer_v3.1 says
something about avoiding dots.

:> SHORT_TOKEN   {TOKENFRONT}{TOKENBACK}?
:> T1       [[:alpha:]]
:< TOKEN_12      ({TOKEN}|{T12})
:> TOKEN_12      ({TOKEN}|{T12}|{T1})

We now have: 
T1              [[:alpha:]]
T12             [[:alpha:]][[:alnum:]]?
TOKEN_12        ({TOKEN}|{T12}|{T1})

If I am not totally wrong, a string matching T1 will also
match T12, so we could simply drop the new addition.

BTW, what was the reason, that TOKEN is not allowed to start
with one digit, but may contain digits inside?

:<   old: ENCODED_WORD =\?{CHARSET}\?(b\?{BASE64}|q\?{QP})\?=
:>   old: ENCODED_WORD =\?{CHARSET}\?(b\?{BASE64}\|q\?{QP})\?=
:< HTML_WO_COMMENTS      "<"[^!][^>]*">"|"<>"
:> HTML_WO_COMMENTS      "<"[^!][^>]*">"\|"<>"

Pure make-up.

:< <HTOKEN>{TOKEN}                                       { return TOKEN; }
:> <HTOKEN>({TOKEN}|{SHORT_TOKEN})                       { return TOKEN; }
:< {TOKEN}                                       { return TOKEN;}
:> ({TOKEN}|{SHORT_TOKEN})                               { return TOKEN;}

Why not define TOKEN in the first place like this:
{TOKENFRONT}({TOKENMID}{TOKENBACK})? and TOKENMID with a *
instead of a + in the end?

:< \${NUM}(\.{NUM})?                             { return TOKEN;}        /* Dollars and cents */
:> \${NUM}(\.{NUM})?                             { return MONEY;}        /* Dollars and cents */

What is the new return code good for? But anyhow, for me
those would be normal tokens;-)

pi



More information about the Bogofilter mailing list