hints [was: Long, folded "To:" line]

michael at optusnet.com.au michael at optusnet.com.au
Wed Sep 3 05:00:21 CEST 2003


David Relson <relson at osagesoftware.com> writes:
> On 03 Sep 2003 08:55:43 +1000
> michael at optusnet.com.au wrote:
> > Careful. Doesn't the RFC talk about the _folded_ line length
> > not exceeding 998 chars?
> 
> Greetings Michael,
> 
> Indeed, I believe you're right.  How do _you_ think such long lines
> should be handled?  The patch allows a very long line, that will have
> its tokens tagged (if it's a To:, From:, Return-Path:, or Subject:
> line), though folds after the max length aren't tagged.

I confess I haven't read through the 0.15 code yet, but might it
be better to leave the un-folding in the lexer? (i.e. having the
lexer eat '\n\w' rather than having the mbox parser do it?

I'm thinking here that having the mbox parser do it it really a
layering violation: The folding is part of the email rfc, not part of
the mbox format.
 
> At the moment bogofilter only has one such special token,
> spc:invalid_end_of_header.  We could add "spc:LineTooLong.  Have you
> other hints that ought to be added?

Lots. :) One of these days I'll get enough time to finish off the
tagging work I've been doing. (I'm tagging things like too much
whitespace in the subject line, the domain in 'From' not matching
the Received lines etc etc).
 
> By the way, with the new bogoreader.c module, there's now a good place
> for doing end-of-message processing (such as adding special tokens).

Yay! I need to catchup with this.
 
> David

Michael.




More information about the Bogofilter mailing list