lexer size [was: Much simplified lexer]

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Thu Nov 13 17:15:33 CET 2003


David Relson wrote:

> Right.  This change:
> 
> - <INITIAL>(ESMTP|SMTP)+/[ \t\n]+id\ {ID}  
> + <INITIAL>(ESMTP|SMTP)+
> 
> has the following size effects:

>   41547	      8	     60	  41615	   a28f	lexer_v3.o
>   43405	      8	  65640	 109053	  1a9fd	lexer_v3.o

Really strange, how little changes can have so huge effects.

>> I don't understand what this is good for. In the original
>> expression the / seems to be wrong, maybe the space behind
>> "id" should also be any kind of whitespace. But why
>> completely remove it?
> 
> Actually Matthias made this change.  The old pattern allows a multiline
> ESMTP|SMTP line, the new line does not.  In his test, the simpler
> pattern works fine.

Well, then.

> Also, the "/" is a lexer operator that causes it to return the matching
> text before the slash.  The text after the slash will be reparsed
> (later).

OK, so in effect we return SMTP or ESMTP.

>> Anyhow, wouldn't the following be nicer:
>> <INITIAL>(E?SMTP)+
> 
> Looks reasonable.
> 
>> And why the +? I only see it in the form "with ESMTP id
>> PAA16337" etc., no repeated SMTP or ESMTP.

I still don't understand the +.

>> So I would have assumed that version:
>> <INITIAL>E?SMTP{WHITESPACE}+{WHITESPACE}id{ID}
> 
> Appears to be unnecessary.  Without it, "make check" still succeeds so
> the effect of the change is minor.

I did my personal test.

<INITIAL>E?SMTP
in my version of lexer does not change any score. And it
gets the size down.

The same in your version of lexer passes "make check".

pi

PS: Attached is my new version of lexer.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lexer_v3.l.new
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20031113/5f5f74c9/attachment.ksh>


More information about the Bogofilter mailing list