min_dev
David Relson
relson at osagesoftware.com
Wed Jun 30 03:52:05 CEST 2004
On Tue, 29 Jun 2004 21:35:41 -0400
Tom Allison wrote:
...[snip]...
> So many degrees of freedom.
> Imagine how long bogotune would take if it has to go through these
> variations!
>
> Anecdotally, I find the most Unsure mail tends to have a small number
> of very high scoring spam-tokens (>0.9) and a large number of high
> scoring ham-tokens (<0.4). The pattern is similar for both spam and
> ham and from 10-20 that I've considered carefully, I can't really
> decide what works best.
>
> Currently:
> min_dev = 0.465
> ham_cutoff = 0.15
> spam_cutoff = 0.51
>
> And I haven't even started to play with these ESF variables.
Tom,
With that min_dev, all the tokens going into the final score will have
extreme scores. I run with a comparable min_dev and often see message
scores based on just 5 or 6 tokens because the large min_dev excludes
everything else. Today I had a false negative with just 3 tokens used.
I've also seen lots of Unsures where there are 2 or 3x times as many ham
as spam (and vice versa :-). It's all a result of the reverse
chi-square test saying "comparable amounts of good stuff and of bad
stuff; therefore don't have any certainty as to ham or spam". Such is
life.
David
More information about the Bogofilter
mailing list