Much simplified lexer
David Relson
relson at osagesoftware.com
Thu Nov 13 22:47:12 CET 2003
On 14 Nov 2003 08:38:06 +1100
michael at optusnet.com.au wrote:
> Matthias Andree <matthias.andree at gmx.de> writes:
> > Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> writes:
> >
> > > Here is the change which does make the difference. You
> > > changed this line:
> > > <INITIAL>(ESMTP|SMTP)+/[ \t\n]+id\ {ID}
> > > to that line:
> > > <INITIAL>(ESMTP|SMTP)+
> >
> > *I* made the change, and the CVS comment noted that it it supposed
> > to speed up the lexer. The original trailing context rule slows the
> > lexer down and doesn't seem useful.
>
> Indeed.
Have you any sense whether the speed difference is significant? My
guess that we're talking about microseconds per message.
> [...]
> > > And why the +? I only see it in the form "with ESMTP id
> > > PAA16337" etc., no repeated SMTP or ESMTP. So I would have
>
> The '+' makes no sense. We're never going to see 'ESMTPESMTP' and
> even if we did, we don't want to match it there.
Not only is the "+" gone, but the special check for E?SMTP has also been
deleted (saving approx 1000 bytes in lexer_v3.o).
> > > assumed that version:
> > > <INITIAL>E?SMTP{WHITESPACE}+{WHITESPACE}id{ID}
> >
> > We may just want to drop it altogether. If we want to drop
> > "constant" parts, say, "constant" Received: or Delivered-To: lines,
> > it'd be better to strip off the first N Received: lines.
>
> Terrible idea. The Received lines are a very rich source for
> significant tokens for me. :)
>
> Michael.
Michael,
Don't worry. I didn't take the idea seriously.
David
More information about the Bogofilter
mailing list