garbage removal and 'outsiders noise'
David Relson
relson at osagesoftware.com
Wed Apr 16 19:41:38 CEST 2003
At 01:37 PM 4/16/03, Jim Correia wrote:
>On Wednesday, April 16, 2003, at 01:22 PM, David Relson wrote:
>
>>His most recent test, "Bogofilter parameters(continued)", shows that
>>using different parameters can have a major effect in making bogofilter
>>more accurate.
>>
>>I ran a series of tests using my mail. I trained bogofilter with 6,173
>>spam and 18,784 ham and then scored 4,317 spam and 9,567 ham.
>>For each set of parameters tested, spam_cutoff was chosen to give approx
>>0.2% false positives. The number of false negatives varied from a high
>>of 290 to a low of 60.
>>
>>Conclusion, using a site's email to determine the best parameters for
>>bogofilter can have a _big_ effect.
>>
>>Corollary: do a thorough test of algorithmic/parametric changes to
>>determine whether they are helpful or harmful.
>
>Is it naive of me to be running bogofilter with the defaults?
>
>I'm running with spam/ham classification, cutoff of 0.95.
>
>(I notice that some of the false negatives are close to the cutoff, but
>most are numerically far from it, so perhaps I am answering my own question :-)
>
>This catches about 90% of my spam (I retrain -Ns with the false negatives)
>and haven't had a false positive yet.
>
>At present there are 18659 good messages and 1569 spam messages in the
>respective wordlists.
>
>Jim
Jim,
The default values do a good job. With additional work, the results can be
improved.
David
More information about the Bogofilter
mailing list