Much simplified lexer

michael at optusnet.com.au michael at optusnet.com.au
Thu Nov 13 22:38:06 CET 2003


Matthias Andree <matthias.andree at gmx.de> writes:
> Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> writes:
> 
> > Here is the change which does make the difference. You
> > changed this line:
> > <INITIAL>(ESMTP|SMTP)+/[ \t\n]+id\ {ID}
> > to that line:
> > <INITIAL>(ESMTP|SMTP)+
> 
> *I* made the change, and the CVS comment noted that it it supposed to
> speed up the lexer. The original trailing context rule slows the lexer
> down and doesn't seem useful.

Indeed.

[...] 
> > And why the +? I only see it in the form "with ESMTP id
> > PAA16337" etc., no repeated SMTP or ESMTP. So I would have

The '+' makes no sense. We're never going to see 'ESMTPESMTP' and
even if we did, we don't want to match it there.

> > assumed that version:
> > <INITIAL>E?SMTP{WHITESPACE}+{WHITESPACE}id{ID}
> 
> We may just want to drop it altogether. If we want to drop "constant"
> parts, say, "constant" Received: or Delivered-To: lines, it'd be better
> to strip off the first N Received: lines.

Terrible idea. The Received lines are a very rich source for significant
tokens for me. :)

Michael.




More information about the Bogofilter mailing list