relson at osagesoftware.com
Wed Aug 4 15:57:42 EDT 2004
On Wed, 4 Aug 2004 13:04:59 -0400
Bob Vincent wrote:
> Thanks. Will double-check. Bogofilter seems have to peaked at about
> 99.5% accuracy. I'd like to exceed that, as 0.5% of 4000+ messages
> still means that I see roughtly 20 spams a day.
An additional thought on new parameters for you. Use the values from
bogotune to score your test messages, take a look at the resulting
spamicities, then choose cutoff values that you think to be good.
Let's assume directory tune.d contains the wordlist.db used for bogotune
and you've put the new parameters into tune.d/bogofilter.cf, run the
BOGO_OPTS="-d tune.d -c tune.d/bogofilter.cf -v"
bogofilter $BOGO_OPTS -B *.ham | sort -n +6 | grep -v 0.000000 >
bogofilter $BOGO_OPTS -B *.spam | sort -n +6 | grep -v 1.000000 >
Looking at ham.scores and spam.scores will show you the highest ham
scores, i.e. false positives if spam_cutoff is too low, and the lowest
spam scores, i.e. false negatives if ham_cutoff is too high.
With that info, you'll be able to pick acceptable cutoff values.
Note: the exact bogofilter commands will depend on whether your messages
are stored in mboxes, maildirs, mh format, etc. I leave the details to
More information about the Bogofilter