Much simplified lexer
michael at optusnet.com.au
michael at optusnet.com.au
Fri Nov 14 02:24:36 CET 2003
David Relson <relson at osagesoftware.com> writes:
> On 14 Nov 2003 08:38:06 +1100
> michael at optusnet.com.au wrote:
>
> > Matthias Andree <matthias.andree at gmx.de> writes:
> > > Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> writes:
> > >
> > > > Here is the change which does make the difference. You
> > > > changed this line:
> > > > <INITIAL>(ESMTP|SMTP)+/[ \t\n]+id\ {ID}
> > > > to that line:
> > > > <INITIAL>(ESMTP|SMTP)+
[..]
> Have you any sense whether the speed difference is significant? My
> guess that we're talking about microseconds per message.
Less. Flex is normally constant time per character. (It just
builds a bloody big character driven state machine. The parsing
complexity will change the size of the state machine, but it
won't normally change the execution time measurably).
> > [...]
> > > > And why the +? I only see it in the form "with ESMTP id
> > > > PAA16337" etc., no repeated SMTP or ESMTP. So I would have
> >
> > The '+' makes no sense. We're never going to see 'ESMTPESMTP' and
> > even if we did, we don't want to match it there.
>
> Not only is the "+" gone, but the special check for E?SMTP has also been
> deleted (saving approx 1000 bytes in lexer_v3.o).
Excellent. I'm assuming that ESMTP still appears at a token though?
It's a significant token in my dbase.
rcvd:ESMTP 10415 18529 0.375495 0.375840
[..]
> > Terrible idea. The Received lines are a very rich source for
> > significant tokens for me. :)
[..]
> Don't worry. I didn't take the idea seriously.
>
Laugh. Good.
Michael.
More information about the Bogofilter
mailing list