how to bogotune?

David Relson relson at osagesoftware.com
Thu Sep 30 01:07:05 CEST 2004


On Wed, 29 Sep 2004 18:24:38 -0300
Trevor Smith wrote:

> On September 29, 2004 4:05 pm, tallison at tacocat.net wrote:
> > You might run up a script to find the low scoring spam or high
> > scoring ham and verify that those are classified (by you) correctly.
> 
> I do not understand what that means. I no longer possess any messages
> that I have used to train bogofilter with. I delete them after they
> have been classified and registered.

Hi Trevor,

As you know by now, bogofilter's scoring algorithm is based on a number
of different parameters.  Changing their values appropriately can make
bogofilter work better for your mail; bad changes will make it worse.

Bogotune is a program that searches for the best combination it can
find.  To do this, it needs a bunch of spam and ham messages against
which to test a variety of parameter values.  It works to find a set of
values that give a low level of false positives (around 0.1%) and the
minimum number of false negatives (spam not caught).  For such a search
to be meaningful, bogotune must be given at least 2000 ham and 2000 spam
to use for the scoring tests.

When it starts up, it reads the tuning messages into memory, finds the
word counts (from the wordlist) and is ready to go.  As a first (sanity)
check, it checks the test messages to see if you've classified them
correctly, i.e. if the ham you've supplied actually give hammish (low)
scores and if the spam give spammish (high) scores.  If there are too
many ham scoring above 0.9 or too many spam scoring below 0.1, bogotune
can't provide meaningful results.

Bogofilter works well without the optimizing that bogotune provides.
Bogotune is needed to squeeze the maximum performance from bogofilter.
However, as described above, bogotune needs a significant number of
messages to work with.  So, if you want to run bogotune, save your
(otherwise unneeded) messages.

HTH,

David



More information about the Bogofilter mailing list