Much simplified lexer
Boris 'pi' Piwinger
3.14 at logic.univie.ac.at
Wed Nov 12 17:49:57 CET 2003
David Relson <relson at osagesoftware.com> wrote:
>> I don't get it. It is really suprising to see this explode,
>> since I removed rules or simplified them, some character
>> classes slightly changed their size. If I take the last CVS
>> version David sent over the list and my version, I get this:
>>
>> text data bss dec hex filename
>> 42597 32 65632 108261 1a6e5 lexer_v3.cvs.o
>> 50233 32 65632 115897 1c4b9 lexer_v3.new.o
>
>You've not shown the size of lexer_v3.l.
That was just unchanged, but anyhow, here they are:
11911 lexer_v3.l.cvs
11625 lexer_v3.l.new
>I've attached my copy lexer_v3.l. Since yesterday I've moved unused
>definitions into comments and made HTMLTOKEN a primary definition
>(rather than a reference to HTML_WI_COMMENT).
Well, I have done similar thing, I just removed unused
things.
>[relson at osage src]$ flex --version
>flex version 2.5.4
Same here.
>[relson at osage src]$ ll lexer_v3*.l
>-rw-r--r-- 1 relson relson 11861 Nov 12 08:11 lexer_v3.l
OK, I have a different version here. I'll check that on
Friday.
>-rw-rw-r-- 1 relson relson 11627 Nov 12 08:12 lexer_v3.pi.1112.l
Two bytes are probably not the problem;-)
>[relson at osage src]$ ll lexer_v3*.c
>-rw-r--r-- 1 relson relson 101336 Nov 12 08:28 lexer_v3.c
>-rw-r--r-- 1 relson relson 118227 Nov 12 11:19 lexer_v3.pi.1112.c
>
>[relson at osage src]$ ll lexer_v3*.o
>-rw-r--r-- 1 relson relson 83704 Nov 12 08:28 lexer_v3.o
>-rw-r--r-- 1 relson relson 93888 Nov 12 11:20 lexer_v3.pi.1112.o
I don't have them around but found something similar.
>[relson at osage src]$ size lexer_v3*.o
> text data bss dec hex filename
> 40773 8 60 40841 9f89 lexer_v3.o
> 50541 8 65640 116189 1c5dd lexer_v3.pi.1112.o
Mine is similar:
50233 32 65632 115897 1c4b9 lexer_v3.new.o
It is not exactly your result, though. I thought it would be
deterministic.
So you have a lexer which produces someting much smaller.
I'll try to find what the reson is.
>We have a problem with the lexer's processing of mime boundary lines.
>
>If the boundary line immediately follows a base64 encoded line, the mime boundary is not recognized in lexer_v3.l. The mime part header after it is then processed as body text.
>
>If the boundary line follows a blank line (or plain text), all is fine.
OK, so you obviously made some changes for that. Was this
only in lexer_v3.l? I'll also apply those to my version and
will make it available (Friday, sorry).
pi
More information about the Bogofilter
mailing list