ANNOUNCE: new Robinson-variant version of bogofilter

Greg Louis glouis at dynamicro.on.ca
Thu Oct 24 19:35:25 CEST 2002


As most readers of this list will be aware, Gary Robinson published a
commentary on Paul Graham's Bayesian calculation in a paper at
http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html
and suggested some changes in the calculation method.  In the past I've
submitted patches against bogofilter-0.7.4 and bogofilter-0.7.5 that
implement two of Gary's suggestions.  Since then, David Relson and I
have done some work to incorporate this code into the bogofilter cvs
tree, with the result that there are now three new options: -g (to run
the original Graham calculations, which is the default); -r (to run
Gary's f(w) and S calculations), and -R (which implies -r and causes
bogofilter to print out a table showing the progress of the
calculations, to help investigate misclassification errors or just to
check that the program is calculating correctly).

Ever since Gary published his essay and I modified an early bogofilter
to try his suggestions, it's been clear that with my training and test
data, better discrimination can be achieved with the modified
calculation than with the original code.  As a result, I've been using
Gary's S and f(w) improvements in production and I no longer run the
original calculation method at all.  I've therefore been maintaining
two bogofilter trees, one that tracks the cvs code with its
-r and -g options, and one outside the project that supports the
Robinson calculations only.  Apart from a couple of cosmetic wording
changes in the -v output, that's the only difference between the two
versions (cvs -- as of this writing -- and my own).

For the convenience of anyone who's interested, I've set up a web page
at http://www.bgl/nu/~glouis/bogofilter with a brief history of my
involvement, and with links to my Robinson-only patch against
bogofilter-0.7.5 and to a changelog.  I've used my initials to
differentiate the Robinson-method-only patch - the current one is
bogofilter-0.7.5-gl3.patch - from the mainstream release.

Those who, like me, are seeing better results with Robinson's method
may like to use my patch, especially if it's likely that a lot of
instances of bogofilter may run on their mail server.  Those who want
to compare the two calculation methods should pull the bogofilter cvs
from SourceForge and use that instead.

-- 
| G r e g  L o u i s          | gpg public key:      |
|   http://www.bgl.nu/~glouis |   finger greg at bgl.nu |




More information about the bogofilter-dev mailing list