Mozilla to use Bayesian spam filtering

Greg Louis glouis at dynamicro.on.ca
Sun Nov 17 13:27:29 CET 2002


On 20021116 (Sat) at 0924:53 -0500, David Relson wrote:
> 
> The SpamBayes project is doing some very, very interesting work on how to 
> evaluate messages.  I think it would be valuable to pool efforts, though I 
> don't know if they'd be interested.  Our project in C offers speed while 
> theirs offers a very nice (IMHO) object oriented language and the most 
> advanced spamicity calculation of which I'm aware.
> 
I drafted a paper yesterday (it's a work in progress) comparing our
geometric-mean calculation with Fisher's method (spambayes uses a
variant of the latter).  The main difference is that with Fisher's
method, a middle ground appears where the algorithm can indicate that
it has inadequate evidence for a decision, so instead of getting a
false negative in such cases, you get an "unknown." It turns out that
training on "unknowns" is a very efficient way to improve the
discrimination, according to spambayes' Tim Peters.  There's also the
fact that clear-cut spams and nonspams tend to have Fisher spamicity
values near 1 or 0, so the choice of spam_cutoff becomes less critical
(with our original Robinson method, a change of 0.01 in spam_cutoff can
make a very big difference, the right value is hard to predict, and you
need to review it periodically as the training set grows).

My test (http://www.bgl.nu/~glouis/bogofilter/fisher.html) didn't
reveal any very large differences in discrimination power between the
two calculation methods, and I don't intend to switch to Fisher's
method immediately.  The spambayes folk, however, who have more
experience with it, say it can be superior; but their objective is not
a binary "spam-or-not-spam" decision, nor are they (at this stage,
anyway) inclined to worry about throughput (use on busy systems).  The
jury is still out, IMHO, and more work needs doing -- and we should
also become familiar with spambayes' testing methods and results.

-- 
| G r e g  L o u i s          | gpg public key:      |
|   http://www.bgl.nu/~glouis |   finger greg at bgl.nu |




More information about the Bogofilter mailing list