[patch] tag Received lines and parse headers better.

David Relson relson at osagesoftware.com
Sun Jul 20 19:21:59 CEST 2003


Michael,

Some more testing results.

First, the code to eat newlines for multi-line headers has a problem.  When 
processing mbx files, the message count is wrong.  Additionally, lines 
matching "\tid MESSAGE_ID" were parsed continuations of Received: headers, 
rather than being ignored.  This results in lots of "rcvd:MESSAGE_ID" 
tokens, which we don't want.

Second, I ran a test with 3479 ham and 4608 spam, I used half of each as my 
training database, then scored the other half of the messages.  With 
"rcvd:" tokens, but not the newline code, both new and old bogofilter did 
as well on the spam messages (2289 correct, 4 false negatives, 57 unsure) 
and new bogofilter did slightly better on ham (1688 correct vs 1685 
correct, 1 false positive, and 50 vs 53 unsures).

At present, this patch doesn't look like a keeper :-(  Sorry.

David





More information about the Bogofilter mailing list