shipping bogofilter with a pre-built token database
tanderso at oac-design.com
Mon Jan 15 20:49:36 EST 2007
> Have you considered shipping (optionally) bogofilter with a pre-built
> token database?
> There are some ham and spam public corpuses avaliable, e.g.
> Training bogofilter to exhaustion and shipping the resulting database
> would definitely help some of the new users.
I don't think that would be terribly useful. Simply turning on
auto-update with a virgin database and classifying spam and ham as they
arrive will produce a 60-70% accurate database within a day or two, and
an 80-90% accurate database within a week. Very few false positives or
false negatives along the way (mostly unsures). But if you start out
with a public database, you'll have to battle against a constant
onslaught of erroneous classifications before it starts working well.
I'd rather start off with all unsures and let it learn correctly from
More information about the Bogofilter-dev