evaluating possible new options

michael at optusnet.com.au michael at optusnet.com.au
Fri May 16 04:26:28 CEST 2003


David Relson <relson at osagesoftware.com> writes:
[..]
> Michael,
> 
> The drop in false negatives is great!  Many more spam are getting caught.
> 
> You don't mention your spam_cutoff value, or how you're choosing it.

0.95 cutoff. No science behind it at all. 

> Greg's methodology starts by scoring a corpus of non-spam to determine
> a spam_cutoff value and then using that to score several corpora of
> spam and count the false negatives (a.k.a missed spam).  Doing that,
> together with using wordlists built with the parameters being tested,
> give him his results.
> 
> My experiments with bogofilter's default parameters (done some a week
> or so ago, before recent changes), indicate that a min_dev in the
> range of 0.35 to 0.45 would be best.  Also, when changing min_dev,
> it's best to change spam_cutoff.  (FWIW, I'm presently using
> min_dev=0.40 and spam_cutoff=0.500).  Anyhow, you might find it
> interesting to test with some higher min_dev values.

I'll setup a run for a range of variable at some point.
Right now, the improve accuracy is turning up a bunch
of mis-filed emails so I'm cleaning the corpus a little.

Michael.




More information about the Bogofilter mailing list