0.8.0 and 0.10.1.3 classification comparison

Fri Jan 31 14:51:09 CET 2003

Over most of yesterday I had a machine sweeping through min_dev from 0
to 0.3 in steps of 0.025 and robs from 0.01 to 1e-7 in steps of a half
decade.  It did this first with 0.10.1.3 and then with 0.8.0.  Both
runs had robx set to 0.415.  The spam cutoff at each point was the
lower of 0.995 and the cutoff needed to produce zero false positives.
Each run tested 2194 spam and 1086 nonspam; the nonspam had all been
classified as unsure in a qualifying bogofilter run.  The unsures were
chosen because they're the likeliest to produce false positives in the
experiment.  None of the experimental emails had been used in training.

With 0.8.0 it turned out that my production settings were optimal:
min_dev = 0.1 and robs = 3.2e-7 gave 0 fp and 2.36% fn.

With 0.10.1.3 the best settings were min_dev = 0.05 and robs = 1e-6;
these gave 1.4% fp (rather high) and 1.7% fn, for an overall error
rate of 3.1%.  The fp-fn balance could have been adjusted by raising
the arbitrary ceiling of 0.995 for the spam cutoff; at 0.998, for
example, I got a more reasonable 0.06% nonspam and 3.0% fp.  In this
experiment, however, it didn't seem that 0.10.1.3 gave greatly improved
overall discrimination with respect to 0.8.0.  (Early in the experiment
it looked as though that might happen, but in the end it wasn't so.)

This is important, because the 0.10 bogofilter is a lot more expensive
than its predecessors in terms of database bulk and time to process. 
If our goal is to eliminate _all_ spam at any cost, that's ok; no doubt
0.10 will catch some spam that 0.8 couldn't, and probably the overall
discrimination will be as good as that of 0.8 once we do some more
training and get some more experience.  But if the idea is to eliminate
_most_ spam quickly in a busy production environment, well, we may be
moving away from that goal.

One possible development path would be to eliminate _all_ spam first
and then work on cutting the cost.  I'd suggest that it may be harder
to (re)introduce scalability late than to design it in as we go. 
Reintroducing the daemon (I have SMP machines to test it on ;-) might
be a wise move at this point.

-- 
| G r e g  L o u i s          | gpg public key:      |
|   http://www.bgl.nu/~glouis |   finger greg at bgl.nu |
| Help free our mailboxes. Include                   |
|        http://wecanstopspam.org in your signature. |