Much simplified lexer

David Relson relson at osagesoftware.com
Wed Nov 12 17:25:10 CET 2003


On Wed, 12 Nov 2003 15:50:25 +0100
Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> wrote:

> David Relson wrote:
> 
> >> > Insprired by our discussion, Tom, I changed the lexer to be
> >> > more in the fashion you describe. If you want to see if it
> >> > works for you, it is attached.
> >> 
> >> How does "size lexer_v3.o" change?
> > 
> > [relson at osage src]$ ll lexer_v3.l lexer_v3.pi.1112.l
> > -rw-r--r--    1 relson   relson      11861 Nov 12 08:11 lexer_v3.l
> > -rw-rw-r--    1 relson   relson      11627 Nov 12 08:12
> > lexer_v3.pi.1112.l
> > 
> > [relson at osage src]$ size lexer_v3.o lexer_v3.pi.1112.o
> >    text	   data	    bss	    dec	    hex	filename
> >   41899	      8	     60	  41967	   a3ef	lexer_v3.o
> >   51610	      8	  65640	 117258	  1ca0a	lexer_v3.pi.1112.o
> > 
> > While the source file is slightly smaller (approx 150 bytes), the .o
> > file is much larger (almost 3x)
> 
> I don't get it. It is really suprising to see this explode,
> since I removed rules or simplified them, some character
> classes slightly changed their size. If I take the last CVS
> version David sent over the list and my version, I get this:
> 
>    text    data     bss     dec     hex filename
>   42597      32   65632  108261   1a6e5 lexer_v3.cvs.o
>   50233      32   65632  115897   1c4b9 lexer_v3.new.o
> 
> pi

pi,

You've not shown the size of lexer_v3.l.  I can't explain your
lexer_v3.cv.o size difference (unless you're using a modified copy of
lexer_v3.l rather than the cvs copy).

I've attached my copy lexer_v3.l.  Since yesterday I've moved unused
definitions into comments and made HTMLTOKEN a primary definition
(rather than a reference to HTML_WI_COMMENT).

Below are version info for flex and my sizes for lexer_v3.l
lexer_v3.pi.1112.l and the associated .c and .o files:

[relson at osage src]$ flex --version
flex version 2.5.4

[relson at osage src]$ ll lexer_v3*.l
-rw-r--r--    1 relson   relson      11861 Nov 12 08:11 lexer_v3.l
-rw-rw-r--    1 relson   relson      11627 Nov 12 08:12
lexer_v3.pi.1112.l

[relson at osage src]$ ll lexer_v3*.c
-rw-r--r--    1 relson   relson     101336 Nov 12 08:28 lexer_v3.c
-rw-r--r--    1 relson   relson     118227 Nov 12 11:19
lexer_v3.pi.1112.c

[relson at osage src]$ ll lexer_v3*.o
-rw-r--r--    1 relson   relson      83704 Nov 12 08:28 lexer_v3.o
-rw-r--r--    1 relson   relson      93888 Nov 12 11:20
lexer_v3.pi.1112.o

[relson at osage src]$ size lexer_v3*.o
   text	   data	    bss	    dec	    hex	filename
  40773	      8	     60	  40841	   9f89	lexer_v3.o
  50541	      8	  65640	 116189	  1c5dd	lexer_v3.pi.1112.o

-------------- next part --------------
We have a problem with the lexer's processing of mime boundary lines.  

If the boundary line immediately follows a base64 encoded line, the mime boundary is not recognized in lexer_v3.l.  The mime part header after it is then processed as body text.

If the boundary line follows a blank line (or plain text), all is fine.



From
Content-type: multipart/mixed; boundary="simple boundary"

--simple boundary
Content-type: text/plain; charset=us-ascii
Content-Transfer-Encoding: base64

dGVzdCAg

--simple boundary
Content-type: text/plain; charset=us-ascii
Content-Transfer-Encoding: base64

dGVzdCAg
--simple boundary
Content-type: text/plain; charset=us-ascii
Content-Transfer-Encoding: base64

dGVzdCAg

--simple boundary--


2.5.4 - never-interactive (w/o YY_GET_NEW_LINE)

    rule 246 ("simple") should be rule 206 ("--simple boundary")
    line 16 should be 'h i' state (header/initial)

	*** 14 b t  9 dGVzdCAg
	*** 15 b t 18 --simple boundary
	--accepting rule at line 246 ("simple")
	simple
	*** 16 b t 43 Content-type: text/plain; charset=us-ascii

2.5.31 - never-interactive (w/o YY_GET_NEW_LINE)

    line 16 should be 'h i' state (header/initial)
    rule 206 ("--simple boundary") is correct

	*** 14 b t  9 dGVzdCAg
	*** 15 b t 18 --simple boundary
	*** 16 b t 43 Content-type: text/plain; charset=us-ascii
	--accepting rule at line 206 ("--simple boundary

-------------- next part --------------
A non-text attachment was scrubbed...
Name: lexer_v3.l
Type: application/octet-stream
Size: 11861 bytes
Desc: not available
URL: <https://www.bogofilter.org/pipermail/bogofilter/attachments/20031112/ca797a2e/attachment.obj>


More information about the bogofilter mailing list