garbage removal and 'outsiders noise'

Jim Correia jim.correia at pobox.com
Wed Apr 16 19:37:53 CEST 2003


On Wednesday, April 16, 2003, at 01:22  PM, David Relson wrote:

> His most recent test, "Bogofilter parameters(continued)", shows that 
> using  different parameters can have a major effect in making 
> bogofilter more accurate.
>
> I ran a series of tests using my mail.  I trained bogofilter with 
> 6,173 spam and 18,784 ham and then scored 4,317 spam and 9,567 ham.  
> For each set of parameters tested, spam_cutoff was chosen to give 
> approx 0.2% false positives.  The number of false negatives varied 
> from a high of 290 to a low of 60.
>
> Conclusion, using a site's email to determine the best parameters for 
> bogofilter can have a _big_ effect.
>
> Corollary: do a thorough test of algorithmic/parametric changes to 
> determine whether they are helpful or harmful.

Is it naive of me to be running bogofilter with the defaults?

I'm running with spam/ham classification, cutoff of 0.95.

(I notice that some of the false negatives are close to the cutoff, but 
most are numerically far from it, so perhaps I am answering my own 
question :-)

This catches about 90% of my spam (I retrain -Ns with the false 
negatives) and haven't had a false positive yet.

At present there are 18659 good messages and 1569 spam messages in the 
respective wordlists.

Jim





More information about the Bogofilter mailing list