tuning Fisher

Greg Louis glouis at dynamicro.on.ca
Tue Jan 28 23:19:15 CET 2003


The other day I reported getting good results with bogofilter 0.10.1.1
plus fixes, using a min_dev value of 0.25.  It appears I was fooled by
a local minimum.

I've just run a sweep of min_dev from 0 to 0.3 and robs from 0.01 to
1e-7.  For each pair of settings I classified a set of 135 nonspams
that had been rated unsure originally, and took the highest spamicity
value plus 0.000001 as the spam cutoff.  Then I classified a set of 200
spams and counted the false negatives.  I used bogofilter-0.10.1.2 with
its rebuilt db (14,374 spams and 9,718 nonspams).

The best results (5 fn out of 200 spams) were obtained either with
min_dev 0.025 and robs 1e-7, or with min_dev 0.05 and robs 3.2e-7 or
1e-7.  This more or less confirms pi's recent findings.  I did see a
local minimum at min_dev 0.225 with robs 0.01 or 0.0032, but it wasn't
as good (12 fn).  The robs values weren't critical; at min_dev 0.025
anything up to 0.001 gave only one more fn.  The robx value wasn't
critical either; I tried the scan with our accidental historical 0.415
as well as with a newly calculated 0.762, and the results were the same
either way.

On the other hand, bogofilter-0.8.0 with min_dev 0.1, robx 0.415, robs
5e-7 and spam cutoff 0.95 got no fp and one fn on the same nonspams and
spams.  Of course, the training db was different -- no mime processing
here.  We may need to take a look at the ROI of mime processing,
though; so far I seem to be doing better without it, which is what
prompted me to try the quick comparison.  That result suggests that a
more thorough comparison may be useful.

-- 
| G r e g  L o u i s          | gpg public key:      |
|   http://www.bgl.nu/~glouis |   finger greg at bgl.nu |
| Help free our mailboxes. Include                   |
|        http://wecanstopspam.org in your signature. |




More information about the Bogofilter mailing list