unescaped "From " lines

Gyepi SAM gyepi at praxis-sw.com
Sun Jan 26 18:26:27 CET 2003


On Sun, Jan 26, 2003 at 09:51:36AM -0500, David Relson wrote:
> Messages have headers and bodies.  "From " as a message separator only 
> occurs in the header.  If bogofilter knew whether it was in a header or a 
> body, life would be simple.  The first state change from header to body is 
> trivial - just the first empty line.
> 
> When to change from body back to header is not trivial.

In fact, it cannot be done correctly all of the time if we accept that '=46rom ' in a qp encoded body
decodes to 'From ' with no other contextual information.
We could say that '^From ' inside an encoded body part is just a token and does not mark a new message.
This complicates things even more, but is doable.  What about '^From ' inside a single message in a Maildir
mailbox? Do we now have to know the mailbox format?

I say we stop trying to decide and simply say that bogofilter deals with a single message at a time.
We pay a penalty when training on a mailbox, but IMO, that penalty is not enough to justify the contortions
we must go to handle an infrequent edge case. Most of the time, bogofilter deals with single messages anyway.

If it turns out that the penalty is unacceptable, I will write bogotrain-mbx or whatever we call it.

-Gyepi 




More information about the bogofilter-dev mailing list