Much simplified lexer

David Relson relson at osagesoftware.com
Thu Nov 13 19:21:04 CET 2003


On Thu, 13 Nov 2003 18:49:28 +0100
Matthias Andree <matthias.andree at gmx.de> wrote:

> Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> writes:
> 
> > Here is the change which does make the difference. You
> > changed this line:
> > <INITIAL>(ESMTP|SMTP)+/[ \t\n]+id\ {ID}
> > to that line:
> > <INITIAL>(ESMTP|SMTP)+
> 
> *I* made the change, and the CVS comment noted that it it supposed to
> speed up the lexer. The original trailing context rule slows the lexer
> down and doesn't seem useful.
> 
> > I don't understand what this is good for. In the original
> > expression the / seems to be wrong, maybe the space behind
> > "id" should also be any kind of whitespace. But why
> > completely remove it?
> >
> > Anyhow, wouldn't the following be nicer:
> > <INITIAL>(E?SMTP)+
> 
> Go figure :-)
> 
> > And why the +? I only see it in the form "with ESMTP id
> > PAA16337" etc., no repeated SMTP or ESMTP. So I would have
> > assumed that version:
> > <INITIAL>E?SMTP{WHITESPACE}+{WHITESPACE}id{ID}
> 
> We may just want to drop it altogether. If we want to drop "constant"
> parts, say, "constant" Received: or Delivered-To: lines, it'd be
> better to strip off the first N Received: lines.

Matthias,

The line has been removed totally.  After your change, all it was doing
was returning ESMTP or SMTP (if header_line_markup is enabled).  Since
enabled is the default behavior (and there's no reason I can think of to
actually need the extra test), treating E?SMTP just like any other text
is the way to go.

FWIW, I suspect the actual speed difference is immeasurably small. 
However, we got luck since the size difference _is_ noticable. 
All-in-all, your change led use in a useful direction.

David




More information about the Bogofilter mailing list