bogotrain

David Relson relson at osagesoftware.com
Wed Aug 4 18:44:30 CEST 2004


On Wed, 4 Aug 2004 12:27:19 -0400
Bob Vincent wrote:

> Weird.
> I finally accumulated enough ham to run bogotrain, and here's what it

----------------------------------------- bogotune, not bogotrain

> came up with:
> 
> ---cut---
> db_cachesize=2
> robx=0.539781
> min_dev=0.178
> robs=0.0178
> sp_esf=0.011573
> ns_esf=0.004228
> spam_cutoff=0.320443    # for 0.05% fp (1); expect 0.03% fn (1).
> #spam_cutoff=0.282543   # for 0.10% fp (2); expect 0.00% fn (0).
> #spam_cutoff=0.270560   # for 0.20% fp (4); expect 0.00% fn (0).
> ham_cutoff=0.340
> ---cut---
> 
> I don't understand why the suggested spam_cutoff is lower than the
> suggested ham_cutoff.  Can anyone explain?

Bob,

bogotune determines the cutoff values by looking at arrays of message
scores.  The spam_cutoff is determined from the non-spam scores (to set
the fp (false positive) values) and the ham_cutoff is determined from
the spam scores.

'Tis likely that some of your spam messages have hammish scores and vice
versa and that's causing the results you're seeing

How large are your message samples (spam and non-spam)?  Have you
checked for incorrectly classified messages?  

Attached is a script I wrote a couple of days ago for checking
classifications.  It uses the current wordlist, scores a set of ham (or
spam) messages, discards ham scoring 0.0000000 and spam scoring
1.000000, and sorts the scores.  With modification for your
directories/mboxes/etc, it should help you find any misclassifications.

HTH,

David
-------------- next part --------------
A non-text attachment was scrubbed...
Name: score.all.sh
Type: application/x-sh
Size: 219 bytes
Desc: not available
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20040804/bd78eca8/attachment.sh>


More information about the Bogofilter mailing list