evaluating possible new options

David Relson relson at osagesoftware.com
Fri May 16 05:09:13 CEST 2003


At 10:26 PM 5/15/03, michael at optusnet.com.au wrote:

>David Relson <relson at osagesoftware.com> writes:
>[..]
> > Michael,
> >
> > The drop in false negatives is great!  Many more spam are getting caught.
> >
> > You don't mention your spam_cutoff value, or how you're choosing it.
>
>0.95 cutoff. No science behind it at all.
>
> > Greg's methodology starts by scoring a corpus of non-spam to determine
> > a spam_cutoff value and then using that to score several corpora of
> > spam and count the false negatives (a.k.a missed spam).  Doing that,
> > together with using wordlists built with the parameters being tested,
> > give him his results.
> >
> > My experiments with bogofilter's default parameters (done some a week
> > or so ago, before recent changes), indicate that a min_dev in the
> > range of 0.35 to 0.45 would be best.  Also, when changing min_dev,
> > it's best to change spam_cutoff.  (FWIW, I'm presently using
> > min_dev=0.40 and spam_cutoff=0.500).  Anyhow, you might find it
> > interesting to test with some higher min_dev values.
>
>I'll setup a run for a range of variable at some point.
>Right now, the improve accuracy is turning up a bunch
>of mis-filed emails so I'm cleaning the corpus a little.
>
>Michael.

Understood.  Seems like every time I look deeply into why there's an 
unexpected result I discover a filing error.





More information about the Bogofilter mailing list