Very low cutoffs (was: headers)

Bill McClain wmcclain at salamander.com
Mon Feb 23 20:41:42 CET 2004


On Wed, 18 Feb 2004 23:43:40 -0500
David Relson <relson at osagesoftware.com> wrote:

> To answer your question, I'd say your spam_cutoff is unusually low.
> 
> I'm using ham_cutoff=0.45 and spam_cutoff=0.501 and get several
> unsures every day.  I believe bogotune suggested 0.500001 for the
> spam_cutoff. However since I regularly see 0.500000 from lists at
> gnu.org, I decided to use a slightly higher value.
> 
> By the way, spam_cutoff and ham_cutoff are only part of the story. 
> What are robs, robx, and min_dev? 

>From last week (I've been away) we were talking about how even middling
message scores are very spammy if your cutoffs are small. From my last
bogotune run:

robx=0.400000
min_dev=0.020
robs=0.0100
spam_cutoff=0.016       # for 0.05% fpos (1); expect 0.03% fneg (1).
#spam_cutoff=0.007      # for 0.10% fpos (2); expect 0.03% fneg (1).
#spam_cutoff=0.003      # for 0.20% fpos (4); expect 0.03% fneg (1).
ham_cutoff=0.003        

I get this warning: "Too few high-scoring non-spams in this data set".

Overall, since I first had enough spam to use bogotune, bogofilter has
caught 98.7% of my spam, and the ratio is improving. False positives are
0.1%, usually new types of mail that have to be reclassified once.

I add all my spam to the database, but have stopped adding ham for a
while. Other stats:

bogoutil -w ~/.bogofilter/ .MSG_COUNT
                                 spam   good
.MSG_COUNT                       5721  10231

bogoutil -H ~/.bogofilter/
Histogram
score   count  pct  histogram
0.00    59590 35.35 #################################
0.05     1161  0.69 #
0.10     1367  0.81 #
0.15     1386  0.82 #
0.20     1478  0.88 #
0.25     1220  0.72 #
0.30     1370  0.81 #
0.35     1401  0.83 #
0.40      920  0.55 #
0.45     1934  1.15 ##
0.50      994  0.59 #
0.55      751  0.45 #
0.60     1869  1.11 ##
0.65      452  0.27 #
0.70     1013  0.60 #
0.75     1252  0.74 #
0.80     1177  0.70 #
0.85     1191  0.71 #
0.90      873  0.52 #
0.95    87190 51.72 ################################################
tot    168589
hapaxes:  ham    3875 ( 2.30%), spam   53376 (31.66%)
   pure:  ham   58777 (34.86%), spam   89007 (52.80%)

-Bill
-- 
Sattre Press                                The King in Yellow
http://sattre-press.com/                 by Robert W. Chambers
info at sattre-press.com         http://sattre-press.com/kiy.html




More information about the Bogofilter mailing list