min_dev

Tom Anderson tanderso at oac-design.com
Wed Jun 30 13:26:58 CEST 2004


On Tue, 2004-06-29 at 21:52, David Relson wrote:
> > min_dev     = 0.465
> > ham_cutoff  = 0.15
> > spam_cutoff = 0.51
> 
> With that min_dev, all the tokens going into the final score will have
> extreme scores.  I run with a comparable min_dev and often see message
> scores based on just 5 or 6 tokens because the large min_dev excludes
> everything else.  Today I had a false negative with just 3 tokens used.

That's exactly why I suggested removing the dependence on 0.5 in the
min_dev calculation.  When your cutoffs are nowhere near centered around
0.5, you need to have a huge min_dev in order to encompass your actual
unsure center, or a tiny one to make it insignificant.

In the case of Tom's numbers above, his unsure zone is from 0.15 to
0.51, the center of which would be at 0.33 with a min_dev of 0.18. 
There's no reason to ignore a token score of 0.6, because that would be
very spammy given his numbers above.  But for a moderately sized min_dev
such as 0.18, such a token would be ignored if centered from 0.5 instead
of 0.33, contributing to false negatives.  Conversely, a token that
scores 0.3 should actually be unsure, whereas bogofilter would currently
score it somewhat hammy with a min_dev of 0.18, again contributing to
false negatives.  The effect would be even more severe if his spam
cutoff were lower.

In order to reduce false negatives, users are forced to increase the
size of their min_dev so that false hammy tokens are not added, or they
are required to trivialize the min_dev so that spammy tokens aren't
ignored.  The effect is that min_dev is ineffective either way.  To
again give min_dev its intended purpose, and to allow moderately sized
min_devs to be effective, I believe that the centering of it at 0.5 must
be changed.  We can either go to an exclusion min and max, or provide
for a parameter to change the center.

The least effect on existing users would be to add a parameter to the
configuration file to change the center, having it default to 0.5. 
Anyone who does not want to change it could leave it as is with no
change to their scoring, while those who prefer to experiment could
modify it.

Tom





More information about the Bogofilter mailing list