Much simplified lexer
David Relson
relson at osagesoftware.com
Wed Nov 12 17:25:10 CET 2003
On Wed, 12 Nov 2003 15:50:25 +0100
Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> wrote:
> David Relson wrote:
>
> >> > Insprired by our discussion, Tom, I changed the lexer to be
> >> > more in the fashion you describe. If you want to see if it
> >> > works for you, it is attached.
> >>
> >> How does "size lexer_v3.o" change?
> >
> > [relson at osage src]$ ll lexer_v3.l lexer_v3.pi.1112.l
> > -rw-r--r-- 1 relson relson 11861 Nov 12 08:11 lexer_v3.l
> > -rw-rw-r-- 1 relson relson 11627 Nov 12 08:12
> > lexer_v3.pi.1112.l
> >
> > [relson at osage src]$ size lexer_v3.o lexer_v3.pi.1112.o
> > text data bss dec hex filename
> > 41899 8 60 41967 a3ef lexer_v3.o
> > 51610 8 65640 117258 1ca0a lexer_v3.pi.1112.o
> >
> > While the source file is slightly smaller (approx 150 bytes), the .o
> > file is much larger (almost 3x)
>
> I don't get it. It is really suprising to see this explode,
> since I removed rules or simplified them, some character
> classes slightly changed their size. If I take the last CVS
> version David sent over the list and my version, I get this:
>
> text data bss dec hex filename
> 42597 32 65632 108261 1a6e5 lexer_v3.cvs.o
> 50233 32 65632 115897 1c4b9 lexer_v3.new.o
>
> pi
pi,
You've not shown the size of lexer_v3.l. I can't explain your
lexer_v3.cv.o size difference (unless you're using a modified copy of
lexer_v3.l rather than the cvs copy).
I've attached my copy lexer_v3.l. Since yesterday I've moved unused
definitions into comments and made HTMLTOKEN a primary definition
(rather than a reference to HTML_WI_COMMENT).
Below are version info for flex and my sizes for lexer_v3.l
lexer_v3.pi.1112.l and the associated .c and .o files:
[relson at osage src]$ flex --version
flex version 2.5.4
[relson at osage src]$ ll lexer_v3*.l
-rw-r--r-- 1 relson relson 11861 Nov 12 08:11 lexer_v3.l
-rw-rw-r-- 1 relson relson 11627 Nov 12 08:12
lexer_v3.pi.1112.l
[relson at osage src]$ ll lexer_v3*.c
-rw-r--r-- 1 relson relson 101336 Nov 12 08:28 lexer_v3.c
-rw-r--r-- 1 relson relson 118227 Nov 12 11:19
lexer_v3.pi.1112.c
[relson at osage src]$ ll lexer_v3*.o
-rw-r--r-- 1 relson relson 83704 Nov 12 08:28 lexer_v3.o
-rw-r--r-- 1 relson relson 93888 Nov 12 11:20
lexer_v3.pi.1112.o
[relson at osage src]$ size lexer_v3*.o
text data bss dec hex filename
40773 8 60 40841 9f89 lexer_v3.o
50541 8 65640 116189 1c5dd lexer_v3.pi.1112.o
-------------- next part --------------
We have a problem with the lexer's processing of mime boundary lines.
If the boundary line immediately follows a base64 encoded line, the mime boundary is not recognized in lexer_v3.l. The mime part header after it is then processed as body text.
If the boundary line follows a blank line (or plain text), all is fine.
From
Content-type: multipart/mixed; boundary="simple boundary"
--simple boundary
Content-type: text/plain; charset=us-ascii
Content-Transfer-Encoding: base64
dGVzdCAg
--simple boundary
Content-type: text/plain; charset=us-ascii
Content-Transfer-Encoding: base64
dGVzdCAg
--simple boundary
Content-type: text/plain; charset=us-ascii
Content-Transfer-Encoding: base64
dGVzdCAg
--simple boundary--
2.5.4 - never-interactive (w/o YY_GET_NEW_LINE)
rule 246 ("simple") should be rule 206 ("--simple boundary")
line 16 should be 'h i' state (header/initial)
*** 14 b t 9 dGVzdCAg
*** 15 b t 18 --simple boundary
--accepting rule at line 246 ("simple")
simple
*** 16 b t 43 Content-type: text/plain; charset=us-ascii
2.5.31 - never-interactive (w/o YY_GET_NEW_LINE)
line 16 should be 'h i' state (header/initial)
rule 206 ("--simple boundary") is correct
*** 14 b t 9 dGVzdCAg
*** 15 b t 18 --simple boundary
*** 16 b t 43 Content-type: text/plain; charset=us-ascii
--accepting rule at line 206 ("--simple boundary
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lexer_v3.l
Type: application/octet-stream
Size: 11861 bytes
Desc: not available
URL: <https://www.bogofilter.org/pipermail/bogofilter/attachments/20031112/ca797a2e/attachment.obj>
More information about the bogofilter
mailing list