lexer header/body
Boris 'pi' Piwinger
3.14 at logic.univie.ac.at
Thu Apr 8 12:03:13 CEST 2004
David Relson wrote:
>> > I haven't looked at the lexer code in a long time. Anyways, the
>> > angle brackets contain a "state" - the lexer bogofilter uses is a
>> > state machine to some extent, the state is changed by the
>> > BEGIN(newstate) statements. <INITIAL> should be the state where we
>> > parse headers, and a blank line should switch the lexer to some kind
>> > of body mode, away from INITIAL.
>>
>> You seem to be right. I just built a version with this:
>> <INITIAL>^Status:.* /* ignore */
>
> INITIAL is the customary name for a lexer's initial state. As it's just
> a symbol, it could be renamed to HEADER or DAVID or PI or anything else.
> However there's no need for doing that.
I understand that. The question is if it always means
header, in which case another name would be more readable.
>> I tested a message with a Status header and the same line in
>> the body. It was recognized in the body only. Great. So
>> INITIAL seems to be HEADER (maybe this is what we should
>> call it then). What I do not understand then ...
>>
>> token.c defines some function called form lexer_v3.l like
>> set_tag. This function checks if we have header_line_markup.
>> This seems unneeded then. Is this overly careful? Or is it a
>> leftover? If not how and why is it needed?
>
> header_line_markup is in use. The '-H' flag turns it off so that a
> message can be processed without the normal prefixes. Run command "grep
> header_line_markup *.[ch]" and you'll find where it's used.
Yes, as I wrote I have found it. The question was if it is
needed in those functions. IOW: Can INITIAL actually match
if it is true? If so another solution seems to be to let -H
get the lexer into another state right away. I don't
understand enough to answer this question yet, therefore I
am asking.
pi
More information about the bogofilter-dev
mailing list