Very low cutoffs (was: headers)
Bill McClain
wmcclain at salamander.com
Mon Feb 23 20:41:42 CET 2004
On Wed, 18 Feb 2004 23:43:40 -0500
David Relson <relson at osagesoftware.com> wrote:
> To answer your question, I'd say your spam_cutoff is unusually low.
>
> I'm using ham_cutoff=0.45 and spam_cutoff=0.501 and get several
> unsures every day. I believe bogotune suggested 0.500001 for the
> spam_cutoff. However since I regularly see 0.500000 from lists at
> gnu.org, I decided to use a slightly higher value.
>
> By the way, spam_cutoff and ham_cutoff are only part of the story.
> What are robs, robx, and min_dev?
>From last week (I've been away) we were talking about how even middling
message scores are very spammy if your cutoffs are small. From my last
bogotune run:
robx=0.400000
min_dev=0.020
robs=0.0100
spam_cutoff=0.016 # for 0.05% fpos (1); expect 0.03% fneg (1).
#spam_cutoff=0.007 # for 0.10% fpos (2); expect 0.03% fneg (1).
#spam_cutoff=0.003 # for 0.20% fpos (4); expect 0.03% fneg (1).
ham_cutoff=0.003
I get this warning: "Too few high-scoring non-spams in this data set".
Overall, since I first had enough spam to use bogotune, bogofilter has
caught 98.7% of my spam, and the ratio is improving. False positives are
0.1%, usually new types of mail that have to be reclassified once.
I add all my spam to the database, but have stopped adding ham for a
while. Other stats:
bogoutil -w ~/.bogofilter/ .MSG_COUNT
spam good
.MSG_COUNT 5721 10231
bogoutil -H ~/.bogofilter/
Histogram
score count pct histogram
0.00 59590 35.35 #################################
0.05 1161 0.69 #
0.10 1367 0.81 #
0.15 1386 0.82 #
0.20 1478 0.88 #
0.25 1220 0.72 #
0.30 1370 0.81 #
0.35 1401 0.83 #
0.40 920 0.55 #
0.45 1934 1.15 ##
0.50 994 0.59 #
0.55 751 0.45 #
0.60 1869 1.11 ##
0.65 452 0.27 #
0.70 1013 0.60 #
0.75 1252 0.74 #
0.80 1177 0.70 #
0.85 1191 0.71 #
0.90 873 0.52 #
0.95 87190 51.72 ################################################
tot 168589
hapaxes: ham 3875 ( 2.30%), spam 53376 (31.66%)
pure: ham 58777 (34.86%), spam 89007 (52.80%)
-Bill
--
Sattre Press The King in Yellow
http://sattre-press.com/ by Robert W. Chambers
info at sattre-press.com http://sattre-press.com/kiy.html
More information about the Bogofilter
mailing list