testing min_dev vs tag_header_lines
David Relson
relson at osagesoftware.com
Fri Feb 14 18:30:51 CET 2003
Greetings,
I've rerun my earlier test with min_dev=0.100, 0.075, and 0.050. Default
values were used for robs (0.001), robx (0.415), spam_cutoff (0.95), and
ham_cutoff (0.100).
s-s s-h s-u h-s h-h h-u
tag-0.100 1604 3 138 2 4918 124
tag-0.075 1615 5 125 2 4950 92
tag-0.050 1632 6 107 3 4955 86
There's a clear pattern - as min_dev decreases, the number of correct
classifications rises - and so does the number of false positives and false
negatives.
The additional false positive is the "David, recognize any of these 7
names?" from classmates.com. I recognize it as being the highest scoring
ham message. For the three values of min_dev, this message gets scores of
0.885221, 0.934846, and 0.957329 - all very high scores for ham.
Oh well, bogofilter is only a program. It's not yet smart enough to know
_exactly_ what I want.
So be it.
David
More information about the Bogofilter
mailing list