Comparative performance tests?

Fri Jul 11 18:38:20 CEST 2003

Has anyone done anyone comparative performance tests
on different algorithms since the latest changes
(case sensitivtiy, etc).

The reason I ask is that I bogofilter let through yet another variant of 
the Nigerian scam (419) spam. The spam was just below the 0.54 cutofff 
using the Robinson GM algorithm but when I tried the same spam using
the Graham algorithm, the spamicity was very close to 1.0 - well above the 
Graham default cut-off of 0.9.

The Graham algorithm certainly seems to work well for spams with lots of 
fairly normal words (like Nigerian spam)  that pull down the spamicity 
count when Robinson or Fisher is used. 

>From the Paul Grahamweb site, his algorithm seems to do quite well 
with similar extensions to those that are now installed in bogofilter (case 
sensitivity, subject line tags, etc). 

So maybe it is time for another look at the relative performance
of the 3 algorithms

Another (perhaps dangerous) thought is that bogofilter could apply 
*several* algorithms and generate a Yes if "any" algorithm generated Yes
- Not sure what the X-bogosity line would look like in that case :-)
-I get you would need to output upto three spamicity values

-- 
Peter Bishop 
pgb at adelard.com
pgb at csr.city.ac.uk