lexer size [was: Much simplified lexer]
Boris 'pi' Piwinger
3.14 at logic.univie.ac.at
Thu Nov 13 17:15:33 CET 2003
David Relson wrote:
> Right. This change:
>
> - <INITIAL>(ESMTP|SMTP)+/[ \t\n]+id\ {ID}
> + <INITIAL>(ESMTP|SMTP)+
>
> has the following size effects:
> 41547 8 60 41615 a28f lexer_v3.o
> 43405 8 65640 109053 1a9fd lexer_v3.o
Really strange, how little changes can have so huge effects.
>> I don't understand what this is good for. In the original
>> expression the / seems to be wrong, maybe the space behind
>> "id" should also be any kind of whitespace. But why
>> completely remove it?
>
> Actually Matthias made this change. The old pattern allows a multiline
> ESMTP|SMTP line, the new line does not. In his test, the simpler
> pattern works fine.
Well, then.
> Also, the "/" is a lexer operator that causes it to return the matching
> text before the slash. The text after the slash will be reparsed
> (later).
OK, so in effect we return SMTP or ESMTP.
>> Anyhow, wouldn't the following be nicer:
>> <INITIAL>(E?SMTP)+
>
> Looks reasonable.
>
>> And why the +? I only see it in the form "with ESMTP id
>> PAA16337" etc., no repeated SMTP or ESMTP.
I still don't understand the +.
>> So I would have assumed that version:
>> <INITIAL>E?SMTP{WHITESPACE}+{WHITESPACE}id{ID}
>
> Appears to be unnecessary. Without it, "make check" still succeeds so
> the effect of the change is minor.
I did my personal test.
<INITIAL>E?SMTP
in my version of lexer does not change any score. And it
gets the size down.
The same in your version of lexer passes "make check".
pi
PS: Attached is my new version of lexer.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lexer_v3.l.new
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20031113/5f5f74c9/attachment.ksh>
More information about the Bogofilter
mailing list