bogotrain

David Relson relson at osagesoftware.com
Wed Aug 4 21:57:42 CEST 2004


On Wed, 4 Aug 2004 13:04:59 -0400
Bob Vincent wrote:

...[snip]..

> Thanks.  Will double-check.  Bogofilter seems have to peaked at about
> 99.5% accuracy. I'd like to exceed that, as 0.5% of 4000+ messages
> still means that I see roughtly 20 spams a day.

Bob,

An additional thought on new parameters for you.   Use the values from
bogotune to score your test messages, take a look at the resulting
spamicities, then choose cutoff values that you think to be good.

Let's assume directory tune.d contains the wordlist.db used for bogotune
and you've put the new parameters into tune.d/bogofilter.cf, run the
following:

BOGO_OPTS="-d tune.d -c tune.d/bogofilter.cf -v"
bogofilter $BOGO_OPTS -B *.ham | sort -n +6 | grep -v 0.000000 >
ham.scores
bogofilter $BOGO_OPTS -B *.spam | sort -n +6 | grep -v 1.000000 >
spam.scores

Looking at ham.scores and spam.scores will show you the highest ham
scores, i.e. false positives if spam_cutoff is too low, and the lowest
spam scores, i.e. false negatives if ham_cutoff is too high.

With that info, you'll be able to pick acceptable cutoff values.

HTH,

David

Note: the exact bogofilter commands will depend on whether your messages
are stored in mboxes, maildirs, mh format, etc.  I leave the details to
you.



More information about the Bogofilter mailing list