garbage removal and 'outsiders noise'
Jim Correia
jim.correia at pobox.com
Wed Apr 16 19:37:53 CEST 2003
On Wednesday, April 16, 2003, at 01:22 PM, David Relson wrote:
> His most recent test, "Bogofilter parameters(continued)", shows that
> using different parameters can have a major effect in making
> bogofilter more accurate.
>
> I ran a series of tests using my mail. I trained bogofilter with
> 6,173 spam and 18,784 ham and then scored 4,317 spam and 9,567 ham.
> For each set of parameters tested, spam_cutoff was chosen to give
> approx 0.2% false positives. The number of false negatives varied
> from a high of 290 to a low of 60.
>
> Conclusion, using a site's email to determine the best parameters for
> bogofilter can have a _big_ effect.
>
> Corollary: do a thorough test of algorithmic/parametric changes to
> determine whether they are helpful or harmful.
Is it naive of me to be running bogofilter with the defaults?
I'm running with spam/ham classification, cutoff of 0.95.
(I notice that some of the false negatives are close to the cutoff, but
most are numerically far from it, so perhaps I am answering my own
question :-)
This catches about 90% of my spam (I retrain -Ns with the false
negatives) and haven't had a false positive yet.
At present there are 18659 good messages and 1569 spam messages in the
respective wordlists.
Jim
More information about the Bogofilter
mailing list