min_dev

David Relson relson at osagesoftware.com
Wed Jun 30 03:52:05 CEST 2004


On Tue, 29 Jun 2004 21:35:41 -0400
Tom Allison wrote:

...[snip]...

> So many degrees of freedom.
> Imagine how long bogotune would take if it has to go through these 
> variations!
> 
> Anecdotally, I find the most Unsure mail tends to have a small number
> of very high scoring spam-tokens (>0.9) and a large number of high
> scoring ham-tokens (<0.4).  The pattern is similar for both spam and
> ham and from 10-20 that I've considered carefully, I can't really
> decide what works best.
> 
> Currently:
> min_dev     = 0.465
> ham_cutoff  = 0.15
> spam_cutoff = 0.51
> 
> And I haven't even started to play with these ESF variables.

Tom,

With that min_dev, all the tokens going into the final score will have
extreme scores.  I run with a comparable min_dev and often see message
scores based on just 5 or 6 tokens because the large min_dev excludes
everything else.  Today I had a false negative with just 3 tokens used.

I've also seen lots of Unsures where there are 2 or 3x times as many ham
as spam (and vice versa :-).  It's all a result of the reverse
chi-square test saying "comparable amounts of good stuff and of bad
stuff; therefore don't have any certainty as to ham or spam".  Such is
life.

David



More information about the Bogofilter mailing list