Testing fisher
David Relson
relson at osagesoftware.com
Tue Jan 28 16:13:27 CET 2003
At 09:50 AM 1/28/03, Boris 'pi' Piwinger wrote:
>David Relson wrote:
>
> >>0.025 for two days now with great success. Still my
> >>.procmailrc has to do some work, but there is basically
> >>nothing I see.
> >
> > With min_dev=0.025 you're discarding tokens with scores from 0.475 to
> > 0.525. What values are you using for robs and robx?
>
>The default values.
Interesting. The defaults are ROBS=0.001f and ROBX=0.415f.
I've been using min_dev=0.10 and ROBX=0.415f. With these values I see
unknown words get the ROBX value as their spam scores and that they aren't
included in the message's spam score because 0.415 is within min_dev of
EVEN_ODDS (0.5). I thought this a proper and good relation, i.e. give
unknown words a neutral score and then ignore them. It seems reasonable.
On the other hand, _you_ are using ROBX=0.415 for unknown words, but with
min_dev=0.025 giving a discard range of 0.475 to 0.525, those unknown words
are contributing to the message's spam score. I'm wondering what effect
that has on your test results.
Observation: "ROBX < EVEN_ODDS - min_dev" includes hammish words in the
message score.
Hypothesis: This contributes to the false negative count.
Experiment: Change ROBX so that it's closer to EVEN_ODDS, say 0.48, and
then rerun your test with min_dev=0.15, 0.20, and 0.25.
Expected result: 0.15 will have more false negatives than 0.25 (due to
including/excluding unknown words).
More information about the Bogofilter
mailing list