tanderso at oac-design.com
Mon Feb 28 11:13:38 EST 2005
----- Original Message -----
From: "Matthias Andree" <matthias.andree at gmx.de>
> Tom Anderson <tanderso at oac-design.com> writes:
>> Yep, there's no need to find a corpus... just train on error. You'll be
>> golden in a week or less. I'd wager you'll get 50% after the first
>> day. Bogofilter can actually be much better at recognizing ham than
>> spam, so most of your spam will be either filtered or unsure after
>> registering just a few hams.
> If the database has hams exclusively, bogofilter will not score anything
> as spam, every spamicity would be between 0 and a default value near
Sorry if I wasn't more clear... I didn't mean to train _exclusively_ hams.
Only that paying attention to training your hams really allows you to lower
your cutoffs so that spams and not hams show up as filtered and unsure
(after training at least some spam). Actually, just registering hams alone
might even be effective if you set your spam cutoff below 0.5. For me,
bogofilter is exponentially better at classifying hams (0 to 0.01) than
spams (0.01 to 1.0).
Bogofilter mailing list
Bogofilter at bogofilter.org
More information about the Bogofilter