how to bogotune?

David Relson relson at osagesoftware.com
Thu Sep 30 02:26:03 CEST 2004


Trevor,

Let's work on this together, OK?  I'll explain and you let me know when
it becomes clear.  Then we can working on fixing the man page so it's
intelligible.  OK?

------

Bogofilter uses a BerkeleyDB database for storing token info from the
messages with which it has been trained.  This file is named
"wordlist.db" and is often called "the wordlist".

For tuning bogotune needs a wordlist representing with a decent amount
of training history and it needs some additional (untrained messages) to
run the tuning tests on.  Experience has shown that the wordlist needs
the contents of 500 each spam and non-spam messages (or more) and that
there also need to be 2000 each of spam and non-spam messages used for
the tuning process.  Thus, in total, 5000 messages is the minimum
needed.

Given the 5000 messages, use 500 each of the ham and spam and build a
new wordlist and use the other 2000 of each for the tuning part. 
Commands

   mkdir new_dir
   bogofilter -v -d new_dir -s < mbox.with.500.spam
   bogofilter -v -d new_dir -n < mbox.with.500.spam
   bogotune -vv -d new_dir -s mbox.with.2000.spam -n mbox.with.2000.ham




More information about the Bogofilter mailing list