testing min_dev vs tag_header_lines

David Relson relson at osagesoftware.com
Fri Feb 14 18:30:51 CET 2003


Greetings,

I've rerun my earlier test with min_dev=0.100, 0.075, and 0.050.  Default 
values were used for robs (0.001), robx (0.415), spam_cutoff (0.95), and 
ham_cutoff (0.100).

                  s-s  s-h  s-u      h-s  h-h  h-u
tag-0.100       1604   3   138       2  4918  124
tag-0.075       1615   5   125       2  4950   92
tag-0.050       1632   6   107       3  4955   86

There's a clear pattern - as min_dev decreases, the number of correct 
classifications rises - and so does the number of false positives and false 
negatives.

The additional false positive is the "David, recognize any of these 7 
names?" from classmates.com.  I recognize it as being the highest scoring 
ham message.  For the three values of min_dev, this message gets scores of 
0.885221, 0.934846, and 0.957329 - all very high scores for ham.

Oh well, bogofilter is only a program.  It's not yet smart enough to know 
_exactly_ what I want.

So be it.

David





More information about the Bogofilter mailing list