bogotune

Greg Louis glouis at dynamicro.on.ca
Tue Mar 2 12:53:13 CET 2004


On 20040301 (Mon) at 2252:41 -0500, Tom Allison wrote:
> "The wordlist contains 2075 non-spam and 968 spam messages.
> Bogotune must be run with at least 2000 of each."
> 
> I'm curious.  I moved up from ~600 to ~900 in the last 2 weeks (est.).
> 
> I'm assuming that my shortcoming is not the email bodies, but the number 
> of word tokens in my database.

More or less.  Bogotune assumes you have enough tokens if you've used
2,000 spam and 2,000 nonspam to build the training database (usually
called wordlist.db).
 
> This might be until May before I have enough...

The bad news is, bogotune doesn't get really accurate until one uses
something like ten times that many.  Before then, it gives rough
indications that should be verified before being put into production.
(Actually, that kind of sanity check is a good idea anyway.)

-- 
| G r e g  L o u i s         | gpg public key: 0x400B1AA86D9E3E64 |
|  http://www.bgl.nu/~glouis |   (on my website or any keyserver) |
|  http://wecanstopspam.org in signatures helps fight junk email. |




More information about the Bogofilter mailing list