lexer header/body
David Relson
relson at osagesoftware.com
Wed Apr 7 23:28:38 CEST 2004
On Wed, 07 Apr 2004 15:46:42 +0200
Boris 'pi' Piwinger wrote:
> Matthias Andree wrote:
>
> > I haven't looked at the lexer code in a long time. Anyways, the
> > angle brackets contain a "state" - the lexer bogofilter uses is a
> > state machine to some extent, the state is changed by the
> > BEGIN(newstate) statements. <INITIAL> should be the state where we
> > parse headers, and a blank line should switch the lexer to some kind
> > of body mode, away from INITIAL.
>
> You seem to be right. I just built a version with this:
> <INITIAL>^Status:.* /* ignore */
INITIAL is the customary name for a lexer's initial state. As it's just
a symbol, it could be renamed to HEADER or DAVID or PI or anything else.
However there's no need for doing that.
> I tested a message with a Status header and the same line in
> the body. It was recognized in the body only. Great. So
> INITIAL seems to be HEADER (maybe this is what we should
> call it then). What I do not understand then ...
>
> token.c defines some function called form lexer_v3.l like
> set_tag. This function checks if we have header_line_markup.
> This seems unneeded then. Is this overly careful? Or is it a
> leftover? If not how and why is it needed?
header_line_markup is in use. The '-H' flag turns it off so that a
message can be processed without the normal prefixes. Run command "grep
header_line_markup *.[ch]" and you'll find where it's used.
David
More information about the bogofilter-dev
mailing list