lexer header/body
David Relson
relson at osagesoftware.com
Thu Apr 8 13:32:39 CEST 2004
On Thu, 08 Apr 2004 12:03:13 +0200
Boris 'pi' Piwinger wrote:
> David Relson wrote:
>
> >> > I haven't looked at the lexer code in a long time. Anyways, the
> >> > angle brackets contain a "state" - the lexer bogofilter uses is a
> >> > state machine to some extent, the state is changed by the
> >> > BEGIN(newstate) statements. <INITIAL> should be the state where
> >we> > parse headers, and a blank line should switch the lexer to some
> >kind> > of body mode, away from INITIAL.
> >>
> >> You seem to be right. I just built a version with this:
> >> <INITIAL>^Status:.* /* ignore */
> >
> > INITIAL is the customary name for a lexer's initial state. As it's
> > just a symbol, it could be renamed to HEADER or DAVID or PI or
> > anything else.
> > However there's no need for doing that.
>
> I understand that. The question is if it always means
> header, in which case another name would be more readable.
You have a point here. I'll think about it.
> Yes, as I wrote I have found it. The question was if it is
> needed in those functions. IOW: Can INITIAL actually match
> if it is true? If so another solution seems to be to let -H
> get the lexer into another state right away. I don't
> understand enough to answer this question yet, therefore I
> am asking.
Other things happen in INITIAL mode that are separate from the header
tagging. For example there's recognition of folded lines, message ids,
VERPS, etc. The lexer _could_ have two flavors of INITIAL, i.e.
INITIAL_WITH_H_FLAG and INITIAL_WITHOUT_H. That would be much more
complex than what we have.
More information about the bogofilter-dev
mailing list