testing min_dev vs tag_header_lines

Greg Louis glouis at dynamicro.on.ca
Fri Feb 14 18:44:27 CET 2003


On 20030214 (Fri) at 1230:51 -0500, David Relson wrote:
> Greetings,
> 
> I've rerun my earlier test with min_dev=0.100, 0.075, and 0.050.  Default 
> values were used for robs (0.001), robx (0.415), spam_cutoff (0.95), and 
> ham_cutoff (0.100).
> 
>                  s-s  s-h  s-u      h-s  h-h  h-u
> tag-0.100       1604   3   138       2  4918  124
> tag-0.075       1615   5   125       2  4950   92
> tag-0.050       1632   6   107       3  4955   86
> 
> There's a clear pattern - as min_dev decreases, the number of correct 
> classifications rises - and so does the number of false positives and false 
> negatives.
> 
> The additional false positive is the "David, recognize any of these 7 
> names?" from classmates.com.  I recognize it as being the highest scoring 
> ham message.  For the three values of min_dev, this message gets scores of 
> 0.885221, 0.934846, and 0.957329 - all very high scores for ham.
> 
> Oh well, bogofilter is only a program.  It's not yet smart enough to know 
> _exactly_ what I want.
> 

If you change min_dev you need to change spam_cutoff.  Comparison of
false-positive counts obtained with an invariant spam_cutoff value says
nothing about discrimination quality.

-- 
| G r e g  L o u i s          | gpg public key:      |
|   http://www.bgl.nu/~glouis |   finger greg at bgl.nu |
| Help free our mailboxes. Include                   |
|        http://wecanstopspam.org in your signature. |




More information about the Bogofilter mailing list