troublesome false negative

David Relson relson at osagesoftware.com
Mon Nov 4 13:50:18 CET 2002


Matthias & Greg,

Thank you for running my troublesome msg.1103.txt through your installations.

Last night I rebuilt my wordlists as Robinson lists (MAX_REPEATS of 1) 
using my original *.mbx files (from August) and adding in all the ham and 
spam from live usage in October.  Not precisely the current "live" 
wordlists, but very similar.

This morning I ran the message with ROBX values of 0.200 and 0.400 and 
using Graham.  Here are the 3 status lines:

	X-Bogosity: No,  tests=bogofilter, spamicity=0.478232
	X-Bogosity: No,  tests=bogofilter, spamicity=0.495448
	X-Bogosity: Yes, tests=bogofilter, spamicity=1.000000

Looking at the Robinson histograms, the different ROBX value moves 42 words 
from the 0.20-0.30 line to the 0.40-0.49 line, which tells me that 42 words 
were previously unknown (about 15% of them).

FWIW, the calculated .ROBX for my wordlist is approx 0.19.  Intuitively, 
this seems much different from 0.400 or 0.415.  As the sample result above 
shows, the actual result doesn't depend much on the value.

Could the two of you generate the histograms and send them to me?  I use 
"bogofilter -r -v -v < msg.1103.txt" to generate my histograms.

Thanks.

David





More information about the Bogofilter mailing list