[patch] tag Received lines and parse headers better.
David Relson
relson at osagesoftware.com
Sun Jul 20 19:21:59 CEST 2003
Michael,
Some more testing results.
First, the code to eat newlines for multi-line headers has a problem. When
processing mbx files, the message count is wrong. Additionally, lines
matching "\tid MESSAGE_ID" were parsed continuations of Received: headers,
rather than being ignored. This results in lots of "rcvd:MESSAGE_ID"
tokens, which we don't want.
Second, I ran a test with 3479 ham and 4608 spam, I used half of each as my
training database, then scored the other half of the messages. With
"rcvd:" tokens, but not the newline code, both new and old bogofilter did
as well on the spam messages (2289 correct, 4 false negatives, 57 unsure)
and new bogofilter did slightly better on ham (1688 correct vs 1685
correct, 1 false positive, and 50 vs 53 unsures).
At present, this patch doesn't look like a keeper :-( Sorry.
David
More information about the Bogofilter
mailing list