shipping bogofilter with a pre-built token database
Tom Anderson
tanderso at oac-design.com
Tue Jan 16 02:49:36 CET 2007
pna.lists wrote:
> Have you considered shipping (optionally) bogofilter with a pre-built
> token database?
>
> There are some ham and spam public corpuses avaliable, e.g.
> http://spamassassin.apache.org/publiccorpus/
>
> Training bogofilter to exhaustion and shipping the resulting database
> would definitely help some of the new users.
I don't think that would be terribly useful. Simply turning on
auto-update with a virgin database and classifying spam and ham as they
arrive will produce a 60-70% accurate database within a day or two, and
an 80-90% accurate database within a week. Very few false positives or
false negatives along the way (mostly unsures). But if you start out
with a public database, you'll have to battle against a constant
onslaught of erroneous classifications before it starts working well.
I'd rather start off with all unsures and let it learn correctly from
the beginning.
Tom
More information about the bogofilter-dev
mailing list