lexer header/body

David Relson relson at osagesoftware.com
Wed Apr 7 23:28:38 CEST 2004


On Wed, 07 Apr 2004 15:46:42 +0200
Boris 'pi' Piwinger wrote:

> Matthias Andree wrote:
> 
> > I haven't looked at the lexer code in a long time. Anyways, the
> > angle brackets contain a "state" - the lexer bogofilter uses is a
> > state machine to some extent, the state is changed by the
> > BEGIN(newstate) statements. <INITIAL> should be the state where we
> > parse headers, and a blank line should switch the lexer to some kind
> > of body mode, away from INITIAL.
> 
> You seem to be right. I just built a version with this:
> <INITIAL>^Status:.*                             /* ignore */

INITIAL is the customary name for a lexer's initial state.  As it's just
a symbol, it could be renamed to HEADER or DAVID or PI or anything else.
 However there's no need for doing that.


> I tested a message with a Status header and the same line in
> the body. It was recognized in the body only. Great. So
> INITIAL seems to be HEADER (maybe this is what we should
> call it then). What I do not understand then ...
> 
> token.c defines some function called form lexer_v3.l like
> set_tag. This function checks if we have header_line_markup.
> This seems unneeded then. Is this overly careful? Or is it a
> leftover? If not how and why is it needed?

header_line_markup is in use.  The '-H' flag turns it off so that a
message can be processed without the normal prefixes.  Run command "grep
header_line_markup *.[ch]" and you'll find where it's used.

David




More information about the bogofilter-dev mailing list