[bogofilter-announce] bogofilter-0.13.5 - new current release

Greg Louis glouis at dynamicro.on.ca
Tue Jun 3 20:02:52 CEST 2003


On 20030603 (Tue) at 1855:08 +0200, Boris 'pi' Piwinger wrote:

> > Still running 0.13.0 (see below) I see one spam mail every
> > other day or so. I got a false positive when applying for
> > Michael Moore's mailing list a few days ago, but usually,
> > this is not a problem.
> 
> Actually, the new robs makes things worse, i.e., 200% more
> false positives (3 instead of 1 out of 22184) with the same
> number of false negatives (430 out of 12721). 0.0001 brings
> it to 0 and 430.

The value of s should never be less than 0.01, because when it is,
words that appear in one list but not in the other are heavily
overweighted in the calculation.  "Heavily overweighted" like 3 or 4
such tokens out of 200, with s around 1e-6, can swing the evaluation
from spam to nonspam or vice versa.  At 0.001 or 0.0001 the effect
isn't quite that bad but you still risk random errors.

When you change the value of s you need to retune the spam cutoff to
get your false-positive count back in line.  There is enough variation
among individual message corpora that we can't make a "one size serves
all" recommendation, but instead of changing s to smaller values, try
increasing the spam cutoff till the fp go away.

Hope that helps......
-- 
| G r e g  L o u i s          | gpg public key: finger     |
|   http://www.bgl.nu/~glouis |   glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |




More information about the Bogofilter mailing list