Comparative performance tests?

Fri Jul 11 19:32:08 CEST 2003

On 20030711 (Fri) at 1738:20 +0100, Peter Bishop wrote:
> Has anyone done anyone comparative performance tests
> on different algorithms since the latest changes
> (case sensitivtiy, etc).

There's no point testing Robinson-GM against Robinson-Fisher; they will
perform with identical accuracy provided each is optimally tuned.  The
only difference between those two is the distribution of spam scores,
but the distributions are monotonically related.  The reason for using
Fisher is that that distribution facilitates tuning (given the same s,
x and min_dev, it's less fussy about the exact spam_cutoff value) and
makes it easier to identify unsures with which to train.

> The reason I ask is that I bogofilter let through yet another variant of 
> the Nigerian scam (419) spam. The spam was just below the 0.54 cutofff 
> using the Robinson GM algorithm but when I tried the same spam using
> the Graham algorithm, the spamicity was very close to 1.0 - well above the 
> Graham default cut-off of 0.9.
> 
> The Graham algorithm certainly seems to work well for spams with lots of 
> fairly normal words (like Nigerian spam)  that pull down the spamicity 
> count when Robinson or Fisher is used. 

Using a high min_dev will have that effect too, if that's what your
spam corpus needs.  FWIW it's been four months since I last got a
Nigerian spam false negative, and I catch around 40 of those a week in
my personal mail.  (And currently I'm using min_dev of 0.02, a very
low value that looks at nearly all the tokens in the message.)

> >From the Paul Grahamweb site, his algorithm seems to do quite well 
> with similar extensions to those that are now installed in bogofilter (case 
> sensitivity, subject line tags, etc). 

With current bogofilter (0.13.7.2), carefully tuned, I'm getting around
0.01% false positives and less than 1% false negatives in production. 
IIRC that's comparable to what Paul gets, except I'm doing it for the
aggregate email of 80 quite disparate users.

> So maybe it is time for another look at the relative performance
> of the 3 algorithms

I suppose it could be interesting, if properly done.  Robinson-GM isn't
worth including, for the abovementioned reason, and it'd be important
to tune both methods optimally before comparison.  I haven't at all
looked at how to optimize Graham's method; the tuning process for
Robinson-Fisher is complex, and I don't doubt that Graham will need
something similar.

-- 
| G r e g  L o u i s          | gpg public key: finger     |
|   http://www.bgl.nu/~glouis |   glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |