defining empty lines.
Greg Louis
glouis at dynamicro.on.ca
Sat May 17 22:15:13 CEST 2003
On 20030517 (Sat) at 1551:25 -0400, David Relson wrote:
> RFC2822 specifies "The body is simply a sequence of characters that follows
> the header and is separated from the header by an empty line (i.e., a line
> with nothing preceding the CRLF).
>
> Jeremy Blosser has encountered many spam messages where "\b\r\n" appears in
> this position. Bogofilter is looking for the truly empty lines for writing
> out the "X-Bogosity" line (in passthrough mode) and gets it wrong for these
> messages.
>
> It's easy enough to modify the code to treat any line consisting only of
> whitespace characters.
Should be recognized and tokenized, maybe. I don't suppose it happens
very often in legitimate mail; might be a useful spam indicator!
BTW I'm rerunning my tests of P options (ignore case, tag headers,
process A IMG and FONT html tags) after correcting two human errors,
and the results of the first 3 of 4 reruns suggest that ignoring case
is not a good thing to do, tagging headers is a very good idea, and
processing those html tag contents helps too. A proper writeup will
appear on my website in a day or two.
--
| G r e g L o u i s | gpg public key: finger |
| http://www.bgl.nu/~glouis | glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |
More information about the Bogofilter
mailing list