lexer header/body

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Thu Apr 8 12:03:13 CEST 2004


David Relson wrote:

>> > I haven't looked at the lexer code in a long time. Anyways, the
>> > angle brackets contain a "state" - the lexer bogofilter uses is a
>> > state machine to some extent, the state is changed by the
>> > BEGIN(newstate) statements. <INITIAL> should be the state where we
>> > parse headers, and a blank line should switch the lexer to some kind
>> > of body mode, away from INITIAL.
>> 
>> You seem to be right. I just built a version with this:
>> <INITIAL>^Status:.*                             /* ignore */
> 
> INITIAL is the customary name for a lexer's initial state.  As it's just
> a symbol, it could be renamed to HEADER or DAVID or PI or anything else.
>  However there's no need for doing that.

I understand that. The question is if it always means
header, in which case another name would be more readable.

>> I tested a message with a Status header and the same line in
>> the body. It was recognized in the body only. Great. So
>> INITIAL seems to be HEADER (maybe this is what we should
>> call it then). What I do not understand then ...
>> 
>> token.c defines some function called form lexer_v3.l like
>> set_tag. This function checks if we have header_line_markup.
>> This seems unneeded then. Is this overly careful? Or is it a
>> leftover? If not how and why is it needed?
> 
> header_line_markup is in use.  The '-H' flag turns it off so that a
> message can be processed without the normal prefixes.  Run command "grep
> header_line_markup *.[ch]" and you'll find where it's used.

Yes, as I wrote I have found it. The question was if it is
needed in those functions. IOW: Can INITIAL actually match
if it is true? If so another solution seems to be to let -H
get the lexer into another state right away. I don't
understand enough to answer this question yet, therefore I
am asking.

pi




More information about the bogofilter-dev mailing list