shipping bogofilter with a pre-built token database

Tom Anderson tanderso at
Tue Jan 16 02:49:36 CET 2007

pna.lists wrote:
> Have you considered shipping (optionally) bogofilter with a pre-built
> token database?
> There are some ham and spam public corpuses avaliable, e.g.
> Training bogofilter to exhaustion and shipping the resulting database
> would definitely help some of the new users.

I don't think that would be terribly useful.  Simply turning on 
auto-update with a virgin database and classifying spam and ham as they 
arrive will produce a 60-70% accurate database within a day or two, and 
an 80-90% accurate database within a week.  Very few false positives or 
false negatives along the way (mostly unsures).  But if you start out 
with a public database, you'll have to battle against a constant 
onslaught of erroneous classifications before it starts working well. 
I'd rather start off with all unsures and let it learn correctly from 
the beginning.


More information about the bogofilter-dev mailing list