lexer change

David Relson relson at osagesoftware.com
Tue Nov 4 15:25:36 CET 2003


Greetings,

My thanks to Boris "pi" Piwinger for the work he's put in on
bogofilter's lexer code.  He tested several sets of changes and some
have been applied to bogofilter.  Here are the details:

1 - remove unnecessary backslashes and reorder TOKEN... patterns.

    As they make _no_ difference to the generated code, most of these
changes have been applied.  I did not remove backslashes followed by
quotes or square brackets as they affect code colorizing in emacs.

2 - acceptance of digits at the beginning of tokens and acceptance of
numbers as tokens

    Rejected.  I don't see value in this change.

3 - acceptance of two character tokens.

    Rejected pending further evaluation.

4 - Removal of the {1,70} repetition count in the TOKEN pattern.

    Accepted.  This is the biggie!  

    With this change the generated lexer_v3.c file shrinks from 1.8M to
1.2M and a stripped bogofilter executable shrinks from 1.8M to 1.4M.

    AFAICT, this change doesn't change parsing results.  Time will tell
if there is an effect that hasn't yet been detected.

To summarize, the changes that help human readability and that reduce
program size have been accepted.  Changes that affect bogofilter results
have not been accepted.

I've attached a patch containing the applied changes.

David
-------------- next part --------------
[relson at osage src]$ cvs diff lexer_v3.l
Index: lexer_v3.l
===================================================================
RCS file: /cvsroot/bogofilter/bogofilter/src/lexer_v3.l,v
retrieving revision 1.106
diff -u -r1.106 lexer_v3.l
--- lexer_v3.l	29 Oct 2003 15:09:25 -0000	1.106
+++ lexer_v3.l	4 Nov 2003 14:10:09 -0000
@@ -134,11 +134,11 @@
 MSG_COUNT	^\"\.MSG_COUNT\"
 
 TOKENFRONT	[^[:blank:][:cntrl:][:digit:][:punct:]]
-TOKENMID	[^[:blank:]<>;=():&%$#@+|/\\{}^\"\?\*,[:cntrl:]\[\]]+
-BOGOLEX_TOKEN	[^[:blank:]<>;    &%  @ |/\\{}^\"  \*,[:cntrl:]\[\]]+
-TOKENBACK	[^[:blank:]<>;=():&%$#@+|/\\{}^\"\?\*\._\-\+,\[\][:cntrl:]]
+TOKENMID	[^[:blank:]<>;=():&%$#@+|/\\{}^\"?*,[:cntrl:][\]]+
+BOGOLEX_TOKEN	[^[:blank:]<>;    &%  @ |/\\{}^\" *,[:cntrl:][\]]+
+TOKENBACK	[^[:blank:]<>;=():&%$#@+|/\\{}^\"?*,[:cntrl:][\]._+-]
 
-TOKEN		{TOKENFRONT}{TOKENMID}{TOKENBACK}{1,70}
+TOKEN		{TOKENFRONT}{TOKENMID}{TOKENBACK}
 TOKEN_12 	({TOKEN}|{A2}|{A1})
 
 BASE64		[0-9a-zA-Z/+=]+




More information about the bogofilter mailing list