lexer header/body

David Relson relson at osagesoftware.com
Thu Apr 8 13:32:39 CEST 2004


On Thu, 08 Apr 2004 12:03:13 +0200
Boris 'pi' Piwinger wrote:

> David Relson wrote:
> 
> >> > I haven't looked at the lexer code in a long time. Anyways, the
> >> > angle brackets contain a "state" - the lexer bogofilter uses is a
> >> > state machine to some extent, the state is changed by the
> >> > BEGIN(newstate) statements. <INITIAL> should be the state where
> >we> > parse headers, and a blank line should switch the lexer to some
> >kind> > of body mode, away from INITIAL.
> >> 
> >> You seem to be right. I just built a version with this:
> >> <INITIAL>^Status:.*                             /* ignore */
> > 
> > INITIAL is the customary name for a lexer's initial state.  As it's
> > just a symbol, it could be renamed to HEADER or DAVID or PI or
> > anything else.
> >  However there's no need for doing that.
> 
> I understand that. The question is if it always means
> header, in which case another name would be more readable.

You have a point here.  I'll think about it.

> Yes, as I wrote I have found it. The question was if it is
> needed in those functions. IOW: Can INITIAL actually match
> if it is true? If so another solution seems to be to let -H
> get the lexer into another state right away. I don't
> understand enough to answer this question yet, therefore I
> am asking.

Other things happen in INITIAL mode that are separate from the header
tagging.  For example there's recognition of folded lines, message ids,
VERPS, etc.  The lexer _could_ have two flavors of INITIAL, i.e.
INITIAL_WITH_H_FLAG and INITIAL_WITHOUT_H.  That would be much more
complex than what we have.




More information about the bogofilter-dev mailing list