Much simplified lexer

michael at optusnet.com.au michael at optusnet.com.au
Fri Nov 14 02:24:36 CET 2003


David Relson <relson at osagesoftware.com> writes:
> On 14 Nov 2003 08:38:06 +1100
> michael at optusnet.com.au wrote:
> 
> > Matthias Andree <matthias.andree at gmx.de> writes:
> > > Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> writes:
> > > 
> > > > Here is the change which does make the difference. You
> > > > changed this line:
> > > > <INITIAL>(ESMTP|SMTP)+/[ \t\n]+id\ {ID}
> > > > to that line:
> > > > <INITIAL>(ESMTP|SMTP)+
[..] 
> Have you any sense whether the speed difference is significant?  My
> guess that we're talking about microseconds per message.

Less. Flex is normally constant time per character. (It just
builds a bloody big character driven state machine. The parsing
complexity will change the size of the state machine, but it
won't normally change the execution time measurably).
 
> > [...] 
> > > > And why the +? I only see it in the form "with ESMTP id
> > > > PAA16337" etc., no repeated SMTP or ESMTP. So I would have
> > 
> > The '+' makes no sense. We're never going to see 'ESMTPESMTP' and
> > even if we did, we don't want to match it there.
> 
> Not only is the "+" gone, but the special check for E?SMTP has also been
> deleted (saving approx 1000 bytes in lexer_v3.o).

Excellent. I'm assuming that ESMTP still appears at a token though?
It's a significant token in my dbase.

rcvd:ESMTP            10415   18529  0.375495  0.375840

[..]
> > Terrible idea. The Received lines are a very rich source for
> > significant tokens for me. :)
[..] 
> Don't worry.  I didn't take the idea seriously.
> 

Laugh. Good. 

Michael.




More information about the Bogofilter mailing list