Much simplified lexer

Wed Nov 12 17:49:57 CET 2003

David Relson <relson at osagesoftware.com> wrote:

>> I don't get it. It is really suprising to see this explode,
>> since I removed rules or simplified them, some character
>> classes slightly changed their size. If I take the last CVS
>> version David sent over the list and my version, I get this:
>> 
>>    text    data     bss     dec     hex filename
>>   42597      32   65632  108261   1a6e5 lexer_v3.cvs.o
>>   50233      32   65632  115897   1c4b9 lexer_v3.new.o
>
>You've not shown the size of lexer_v3.l.  

That was just unchanged, but anyhow, here they are:
11911 lexer_v3.l.cvs
11625 lexer_v3.l.new

>I've attached my copy lexer_v3.l.  Since yesterday I've moved unused
>definitions into comments and made HTMLTOKEN a primary definition
>(rather than a reference to HTML_WI_COMMENT).

Well, I have done similar thing, I just removed unused
things.

>[relson at osage src]$ flex --version
>flex version 2.5.4

Same here.

>[relson at osage src]$ ll lexer_v3*.l
>-rw-r--r--    1 relson   relson      11861 Nov 12 08:11 lexer_v3.l

OK, I have a different version here. I'll check that on
Friday.

>-rw-rw-r--    1 relson   relson      11627 Nov 12 08:12 lexer_v3.pi.1112.l

Two bytes are probably not the problem;-)

>[relson at osage src]$ ll lexer_v3*.c
>-rw-r--r--    1 relson   relson     101336 Nov 12 08:28 lexer_v3.c
>-rw-r--r--    1 relson   relson     118227 Nov 12 11:19 lexer_v3.pi.1112.c
>
>[relson at osage src]$ ll lexer_v3*.o
>-rw-r--r--    1 relson   relson      83704 Nov 12 08:28 lexer_v3.o
>-rw-r--r--    1 relson   relson      93888 Nov 12 11:20 lexer_v3.pi.1112.o

I don't have them around but found something similar.

>[relson at osage src]$ size lexer_v3*.o
>   text	   data	    bss	    dec	    hex	filename
>  40773	      8	     60	  40841	   9f89	lexer_v3.o
>  50541	      8	  65640	 116189	  1c5dd	lexer_v3.pi.1112.o

Mine is similar:
50233      32   65632  115897   1c4b9 lexer_v3.new.o
It is not exactly your result, though. I thought it would be
deterministic.

So you have a lexer which produces someting much smaller.
I'll try to find what the reson is.

>We have a problem with the lexer's processing of mime boundary lines.  
>
>If the boundary line immediately follows a base64 encoded line, the mime boundary is not recognized in lexer_v3.l.  The mime part header after it is then processed as body text.
>
>If the boundary line follows a blank line (or plain text), all is fine.

OK, so you obviously made some changes for that. Was this
only in lexer_v3.l? I'll also apply those to my version and
will make it available (Friday, sorry).

pi