hints [was: Long, folded "To:" line]

David Relson relson at osagesoftware.com
Wed Sep 3 05:29:14 CEST 2003


On 03 Sep 2003 13:00:21 +1000
michael at optusnet.com.au wrote:

> 
> I confess I haven't read through the 0.15 code yet, but might it
> be better to leave the un-folding in the lexer? (i.e. having the
> lexer eat '\n\w' rather than having the mbox parser do it?
> 
> I'm thinking here that having the mbox parser do it it really a
> layering violation: The folding is part of the email rfc, not part of
> the mbox format.

The processing is (roughly):

lexer_v3.l      yylex()
lexer.c            yyinput()
                     get_decoded_line()
                       get_unfolded_line()
bogoreader.c             reader_getline()
                           mailbox_getline()
                             or
                           simple_getline()
mime.c                 mime_decode()
base64.c                 base64_decode()
                           or
qp.c                     qp_decode()

So, the unfolding is done at a fairly low level and before the lexer
rules ever see the text.

get_unfolded_line() is only called for message header lines and
mime_decode() is only used for message (and mime-part) body lines.

> Lots. :) One of these days I'll get enough time to finish off the
> tagging work I've been doing. (I'm tagging things like too much
> whitespace in the subject line, the domain in 'From' not matching
> the Received lines etc etc).
>  
> > By the way, with the new bogoreader.c module, there's now a good
> > place for doing end-of-message processing (such as adding special
> > tokens).
> 
> Yay! I need to catchup with this.

The place you want is, I suspect, file bogofilter.c, function
bogofilter, at the end of the while loop.


-- 
David Relson                   Osage Software Systems, Inc.
relson at osagesoftware.com       Ann Arbor, MI 48103
www.osagesoftware.com          tel:  734.821.8800




More information about the Bogofilter mailing list