testing min_dev vs tag_header_lines
Greg Louis
glouis at dynamicro.on.ca
Fri Feb 14 18:44:27 CET 2003
On 20030214 (Fri) at 1230:51 -0500, David Relson wrote:
> Greetings,
>
> I've rerun my earlier test with min_dev=0.100, 0.075, and 0.050. Default
> values were used for robs (0.001), robx (0.415), spam_cutoff (0.95), and
> ham_cutoff (0.100).
>
> s-s s-h s-u h-s h-h h-u
> tag-0.100 1604 3 138 2 4918 124
> tag-0.075 1615 5 125 2 4950 92
> tag-0.050 1632 6 107 3 4955 86
>
> There's a clear pattern - as min_dev decreases, the number of correct
> classifications rises - and so does the number of false positives and false
> negatives.
>
> The additional false positive is the "David, recognize any of these 7
> names?" from classmates.com. I recognize it as being the highest scoring
> ham message. For the three values of min_dev, this message gets scores of
> 0.885221, 0.934846, and 0.957329 - all very high scores for ham.
>
> Oh well, bogofilter is only a program. It's not yet smart enough to know
> _exactly_ what I want.
>
If you change min_dev you need to change spam_cutoff. Comparison of
false-positive counts obtained with an invariant spam_cutoff value says
nothing about discrimination quality.
--
| G r e g L o u i s | gpg public key: |
| http://www.bgl.nu/~glouis | finger greg at bgl.nu |
| Help free our mailboxes. Include |
| http://wecanstopspam.org in your signature. |
More information about the Bogofilter
mailing list