tuning Fisher
Greg Louis
glouis at dynamicro.on.ca
Tue Jan 28 23:19:15 CET 2003
The other day I reported getting good results with bogofilter 0.10.1.1
plus fixes, using a min_dev value of 0.25. It appears I was fooled by
a local minimum.
I've just run a sweep of min_dev from 0 to 0.3 and robs from 0.01 to
1e-7. For each pair of settings I classified a set of 135 nonspams
that had been rated unsure originally, and took the highest spamicity
value plus 0.000001 as the spam cutoff. Then I classified a set of 200
spams and counted the false negatives. I used bogofilter-0.10.1.2 with
its rebuilt db (14,374 spams and 9,718 nonspams).
The best results (5 fn out of 200 spams) were obtained either with
min_dev 0.025 and robs 1e-7, or with min_dev 0.05 and robs 3.2e-7 or
1e-7. This more or less confirms pi's recent findings. I did see a
local minimum at min_dev 0.225 with robs 0.01 or 0.0032, but it wasn't
as good (12 fn). The robs values weren't critical; at min_dev 0.025
anything up to 0.001 gave only one more fn. The robx value wasn't
critical either; I tried the scan with our accidental historical 0.415
as well as with a newly calculated 0.762, and the results were the same
either way.
On the other hand, bogofilter-0.8.0 with min_dev 0.1, robx 0.415, robs
5e-7 and spam cutoff 0.95 got no fp and one fn on the same nonspams and
spams. Of course, the training db was different -- no mime processing
here. We may need to take a look at the ROI of mime processing,
though; so far I seem to be doing better without it, which is what
prompted me to try the quick comparison. That result suggests that a
more thorough comparison may be useful.
--
| G r e g L o u i s | gpg public key: |
| http://www.bgl.nu/~glouis | finger greg at bgl.nu |
| Help free our mailboxes. Include |
| http://wecanstopspam.org in your signature. |
More information about the Bogofilter
mailing list