[patch] tag Received lines and parse headers better.
    David Relson 
    relson at osagesoftware.com
       
    Sun Jul 20 19:21:59 CEST 2003
    
    
  
Michael,
Some more testing results.
First, the code to eat newlines for multi-line headers has a problem.  When 
processing mbx files, the message count is wrong.  Additionally, lines 
matching "\tid MESSAGE_ID" were parsed continuations of Received: headers, 
rather than being ignored.  This results in lots of "rcvd:MESSAGE_ID" 
tokens, which we don't want.
Second, I ran a test with 3479 ham and 4608 spam, I used half of each as my 
training database, then scored the other half of the messages.  With 
"rcvd:" tokens, but not the newline code, both new and old bogofilter did 
as well on the spam messages (2289 correct, 4 false negatives, 57 unsure) 
and new bogofilter did slightly better on ham (1688 correct vs 1685 
correct, 1 false positive, and 50 vs 53 unsures).
At present, this patch doesn't look like a keeper :-(  Sorry.
David
    
    
More information about the bogofilter
mailing list