defining empty lines.

Greg Louis glouis at dynamicro.on.ca
Sat May 17 22:15:13 CEST 2003


On 20030517 (Sat) at 1551:25 -0400, David Relson wrote:

> RFC2822 specifies "The body is simply a sequence of characters that follows 
> the header and is separated from the header by an empty line (i.e., a line 
> with nothing preceding the CRLF).
> 
> Jeremy Blosser has encountered many spam messages where "\b\r\n" appears in 
> this position.  Bogofilter is looking for the truly empty lines for writing 
> out the "X-Bogosity" line (in passthrough mode) and gets it wrong for these 
> messages.
> 
> It's easy enough to modify the code to treat any line consisting only of 
> whitespace characters.

Should be recognized and tokenized, maybe.  I don't suppose it happens
very often in legitimate mail; might be a useful spam indicator!

BTW I'm rerunning my tests of P options (ignore case, tag headers,
process A IMG and FONT html tags) after correcting two human errors,
and the results of the first 3 of 4 reruns suggest that ignoring case
is not a good thing to do, tagging headers is a very good idea, and
processing those html tag contents helps too.  A proper writeup will
appear on my website in a day or two.

-- 
| G r e g  L o u i s          | gpg public key: finger     |
|   http://www.bgl.nu/~glouis |   glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |




More information about the Bogofilter mailing list