lexer header/body

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Wed Apr 7 12:31:22 CEST 2004


Hi!

I was just thinking about adding something to the lexer
which would ignore Status lines (they are added in the
mailbox and disturb understanding of the message at the time
bogofilter looked at it originally). Now the lexer has some
rules to catch various header fields. AFAIU they always work
like this:

1) Identify something which looks like a header field.

2) Do something with it in a function.
2a) In the function check if we are in the header (there is
a variable which has this information). If yes treat as
header, if no refuse to work on it.

Is this correct?

If so, wouldn't it work (and be maybe faster) to have a
clear HEADER state (instead of using INITIAL all the time)
to deal with those things? Once we leave the header, those
rules would never be used again.

Let me give one example:
<INITIAL>(file)?name=\"?			/* ignore */
<INITIAL>\n?[[:blank:]]id\ {ID}			/* ignore */

Those seem to work in header and body. I believe this is
intended to capture MIME headers in multipart messages as
well. But it is not precise (won't probably no harm,
though). It would capture this anywhere in the body. Right?

If I would use something like
<INITIAL>^Status:.* /* ignore */
this would also do too much (and probably also do no harm).

Are my thoughts correct?

I am sorry, I don't understand this well enough to code my
idea about HEADER.

pi




More information about the bogofilter-dev mailing list