Much simplified lexer

Thu Nov 13 22:47:12 CET 2003

On 14 Nov 2003 08:38:06 +1100
michael at optusnet.com.au wrote:

> Matthias Andree <matthias.andree at gmx.de> writes:
> > Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> writes:
> > 
> > > Here is the change which does make the difference. You
> > > changed this line:
> > > <INITIAL>(ESMTP|SMTP)+/[ \t\n]+id\ {ID}
> > > to that line:
> > > <INITIAL>(ESMTP|SMTP)+
> > 
> > *I* made the change, and the CVS comment noted that it it supposed
> > to speed up the lexer. The original trailing context rule slows the
> > lexer down and doesn't seem useful.
> 
> Indeed.

Have you any sense whether the speed difference is significant?  My
guess that we're talking about microseconds per message.

> [...] 
> > > And why the +? I only see it in the form "with ESMTP id
> > > PAA16337" etc., no repeated SMTP or ESMTP. So I would have
> 
> The '+' makes no sense. We're never going to see 'ESMTPESMTP' and
> even if we did, we don't want to match it there.

Not only is the "+" gone, but the special check for E?SMTP has also been
deleted (saving approx 1000 bytes in lexer_v3.o).

> > > assumed that version:
> > > <INITIAL>E?SMTP{WHITESPACE}+{WHITESPACE}id{ID}
> > 
> > We may just want to drop it altogether. If we want to drop
> > "constant" parts, say, "constant" Received: or Delivered-To: lines,
> > it'd be better to strip off the first N Received: lines.
> 
> Terrible idea. The Received lines are a very rich source for
> significant tokens for me. :)
> 
> Michael.

Michael,

Don't worry.  I didn't take the idea seriously.

David