garbage removal and 'outsiders noise'

David Relson relson at osagesoftware.com
Wed Apr 16 19:41:38 CEST 2003


At 01:37 PM 4/16/03, Jim Correia wrote:

>On Wednesday, April 16, 2003, at 01:22  PM, David Relson wrote:
>
>>His most recent test, "Bogofilter parameters(continued)", shows that 
>>using  different parameters can have a major effect in making bogofilter 
>>more accurate.
>>
>>I ran a series of tests using my mail.  I trained bogofilter with 6,173 
>>spam and 18,784 ham and then scored 4,317 spam and 9,567 ham.
>>For each set of parameters tested, spam_cutoff was chosen to give approx 
>>0.2% false positives.  The number of false negatives varied from a high 
>>of 290 to a low of 60.
>>
>>Conclusion, using a site's email to determine the best parameters for 
>>bogofilter can have a _big_ effect.
>>
>>Corollary: do a thorough test of algorithmic/parametric changes to 
>>determine whether they are helpful or harmful.
>
>Is it naive of me to be running bogofilter with the defaults?
>
>I'm running with spam/ham classification, cutoff of 0.95.
>
>(I notice that some of the false negatives are close to the cutoff, but 
>most are numerically far from it, so perhaps I am answering my own question :-)
>
>This catches about 90% of my spam (I retrain -Ns with the false negatives) 
>and haven't had a false positive yet.
>
>At present there are 18659 good messages and 1569 spam messages in the 
>respective wordlists.
>
>Jim

Jim,

The default values do a good job.  With additional work, the results can be 
improved.

David






More information about the Bogofilter mailing list