hints [was: Long, folded "To:" line]
David Relson
relson at osagesoftware.com
Wed Sep 3 05:29:14 CEST 2003
On 03 Sep 2003 13:00:21 +1000
michael at optusnet.com.au wrote:
>
> I confess I haven't read through the 0.15 code yet, but might it
> be better to leave the un-folding in the lexer? (i.e. having the
> lexer eat '\n\w' rather than having the mbox parser do it?
>
> I'm thinking here that having the mbox parser do it it really a
> layering violation: The folding is part of the email rfc, not part of
> the mbox format.
The processing is (roughly):
lexer_v3.l yylex()
lexer.c yyinput()
get_decoded_line()
get_unfolded_line()
bogoreader.c reader_getline()
mailbox_getline()
or
simple_getline()
mime.c mime_decode()
base64.c base64_decode()
or
qp.c qp_decode()
So, the unfolding is done at a fairly low level and before the lexer
rules ever see the text.
get_unfolded_line() is only called for message header lines and
mime_decode() is only used for message (and mime-part) body lines.
> Lots. :) One of these days I'll get enough time to finish off the
> tagging work I've been doing. (I'm tagging things like too much
> whitespace in the subject line, the domain in 'From' not matching
> the Received lines etc etc).
>
> > By the way, with the new bogoreader.c module, there's now a good
> > place for doing end-of-message processing (such as adding special
> > tokens).
>
> Yay! I need to catchup with this.
The place you want is, I suspect, file bogofilter.c, function
bogofilter, at the end of the while loop.
--
David Relson Osage Software Systems, Inc.
relson at osagesoftware.com Ann Arbor, MI 48103
www.osagesoftware.com tel: 734.821.8800
More information about the Bogofilter
mailing list