Testing fisher

David Relson relson at osagesoftware.com
Tue Jan 28 16:13:27 CET 2003


At 09:50 AM 1/28/03, Boris 'pi' Piwinger wrote:

>David Relson wrote:
>
> >>0.025 for two days now with great success. Still my
> >>.procmailrc has to do some work, but there is basically
> >>nothing I see.
> >
> > With min_dev=0.025 you're discarding tokens with scores from 0.475 to
> > 0.525.  What values are you using for robs and robx?
>
>The default values.

Interesting.  The defaults are ROBS=0.001f and ROBX=0.415f.

I've been using min_dev=0.10 and ROBX=0.415f.  With these values I see 
unknown words get the ROBX value as their spam scores and that they aren't 
included in the message's spam score because 0.415 is within min_dev of 
EVEN_ODDS (0.5).  I thought this a proper and good relation, i.e. give 
unknown words a neutral score and then ignore them.  It seems reasonable.

On the other hand, _you_ are using ROBX=0.415 for unknown words, but with 
min_dev=0.025 giving a discard range of 0.475 to 0.525, those unknown words 
are contributing to the message's spam score.  I'm wondering what effect 
that has on your test results.

Observation: "ROBX < EVEN_ODDS - min_dev" includes hammish words in the 
message score.
Hypothesis:  This contributes to the false negative count.
Experiment:  Change ROBX so that it's closer to EVEN_ODDS, say 0.48, and 
then rerun your test with min_dev=0.15, 0.20, and 0.25.
Expected result:  0.15 will have more false negatives than 0.25 (due to 
including/excluding unknown words).







More information about the Bogofilter mailing list