troublesome false negative
David Relson
relson at osagesoftware.com
Mon Nov 4 13:50:18 CET 2002
Matthias & Greg,
Thank you for running my troublesome msg.1103.txt through your installations.
Last night I rebuilt my wordlists as Robinson lists (MAX_REPEATS of 1)
using my original *.mbx files (from August) and adding in all the ham and
spam from live usage in October. Not precisely the current "live"
wordlists, but very similar.
This morning I ran the message with ROBX values of 0.200 and 0.400 and
using Graham. Here are the 3 status lines:
X-Bogosity: No, tests=bogofilter, spamicity=0.478232
X-Bogosity: No, tests=bogofilter, spamicity=0.495448
X-Bogosity: Yes, tests=bogofilter, spamicity=1.000000
Looking at the Robinson histograms, the different ROBX value moves 42 words
from the 0.20-0.30 line to the 0.40-0.49 line, which tells me that 42 words
were previously unknown (about 15% of them).
FWIW, the calculated .ROBX for my wordlist is approx 0.19. Intuitively,
this seems much different from 0.400 or 0.415. As the sample result above
shows, the actual result doesn't depend much on the value.
Could the two of you generate the histograms and send them to me? I use
"bogofilter -r -v -v < msg.1103.txt" to generate my histograms.
Thanks.
David
More information about the Bogofilter
mailing list