Much simplified lexer

Thu Nov 13 15:47:37 CET 2003

David Relson wrote:

>> I don't get it. It is really suprising to see this explode,
>> since I removed rules or simplified them, some character
>> classes slightly changed their size. If I take the last CVS
>> version David sent over the list and my version, I get this:
>> 
>>    text    data     bss     dec     hex filename
>>   42597      32   65632  108261   1a6e5 lexer_v3.cvs.o
>>   50233      32   65632  115897   1c4b9 lexer_v3.new.o

> I've attached my copy lexer_v3.l.  Since yesterday I've moved unused
> definitions into comments and made HTMLTOKEN a primary definition
> (rather than a reference to HTML_WI_COMMENT).

Right, looks great, also the unified use of character
classes and a lot of those \ is suggested. But none of those
make a difference.

Here is the change which does make the difference. You
changed this line:
<INITIAL>(ESMTP|SMTP)+/[ \t\n]+id\ {ID}
to that line:
<INITIAL>(ESMTP|SMTP)+

I don't understand what this is good for. In the original
expression the / seems to be wrong, maybe the space behind
"id" should also be any kind of whitespace. But why
completely remove it?

Anyhow, wouldn't the following be nicer:
<INITIAL>(E?SMTP)+

And why the +? I only see it in the form "with ESMTP id
PAA16337" etc., no repeated SMTP or ESMTP. So I would have
assumed that version:
<INITIAL>E?SMTP{WHITESPACE}+{WHITESPACE}id{ID}

pi