Comparative performance tests?
Peter Bishop
pgb at adelard.com
Fri Jul 11 18:38:20 CEST 2003
Has anyone done anyone comparative performance tests
on different algorithms since the latest changes
(case sensitivtiy, etc).
The reason I ask is that I bogofilter let through yet another variant of
the Nigerian scam (419) spam. The spam was just below the 0.54 cutofff
using the Robinson GM algorithm but when I tried the same spam using
the Graham algorithm, the spamicity was very close to 1.0 - well above the
Graham default cut-off of 0.9.
The Graham algorithm certainly seems to work well for spams with lots of
fairly normal words (like Nigerian spam) that pull down the spamicity
count when Robinson or Fisher is used.
>From the Paul Grahamweb site, his algorithm seems to do quite well
with similar extensions to those that are now installed in bogofilter (case
sensitivity, subject line tags, etc).
So maybe it is time for another look at the relative performance
of the 3 algorithms
Another (perhaps dangerous) thought is that bogofilter could apply
*several* algorithms and generate a Yes if "any" algorithm generated Yes
- Not sure what the X-bogosity line would look like in that case :-)
-I get you would need to output upto three spamicity values
--
Peter Bishop
pgb at adelard.com
pgb at csr.city.ac.uk
More information about the Bogofilter
mailing list