shipping bogofilter with a pre-built token database

Tom Anderson tanderso at oac-design.com
Tue Jan 16 02:49:36 CET 2007


pna.lists wrote:
> Have you considered shipping (optionally) bogofilter with a pre-built
> token database?
> 
> There are some ham and spam public corpuses avaliable, e.g.
> http://spamassassin.apache.org/publiccorpus/
> 
> Training bogofilter to exhaustion and shipping the resulting database
> would definitely help some of the new users.

I don't think that would be terribly useful.  Simply turning on 
auto-update with a virgin database and classifying spam and ham as they 
arrive will produce a 60-70% accurate database within a day or two, and 
an 80-90% accurate database within a week.  Very few false positives or 
false negatives along the way (mostly unsures).  But if you start out 
with a public database, you'll have to battle against a constant 
onslaught of erroneous classifications before it starts working well. 
I'd rather start off with all unsures and let it learn correctly from 
the beginning.

Tom




More information about the bogofilter-dev mailing list