Testing fisher
Boris 'pi' Piwinger
3.14 at logic.univie.ac.at
Wed Jan 29 11:49:34 CET 2003
David Relson wrote:
> On the other hand, _you_ are using ROBX=0.415 for unknown words, but with
> min_dev=0.025 giving a discard range of 0.475 to 0.525, those unknown words
> are contributing to the message's spam score. I'm wondering what effect
> that has on your test results.
>
> Observation: "ROBX < EVEN_ODDS - min_dev" includes hammish words in the
> message score.
> Hypothesis: This contributes to the false negative count.
> Experiment: Change ROBX so that it's closer to EVEN_ODDS, say 0.48, and
> then rerun your test with min_dev=0.15, 0.20, and 0.25.
> Expected result: 0.15 will have more false negatives than 0.25 (due to
> including/excluding unknown words).
algorithm min_def spam_cutoff test.spam test.ham
total F-N total F-P
robx=0.415
fisher-2 0.10 0.95 4186 364 15140 1
fisher-2 0.25 0.60 4335 191 15362 0
fisher-2 0.20 0.60 4221 184 15140 0
fisher-2 0.15 0.60 4237 170 15251 0
fisher-2 0.10 0.60 4221 139 15140 1
fisher-2 0.075 0.60 4237 132 15251 1
fisher-2 0.05 0.60 4237 116 15251 1
fisher-2 0.035 0.60 4262 101 15251 1
fisher-2 0.025 0.60 4262 89 15251 1
fisher-2 0.02 0.60 4297 92 15362 1
fisher-2 0.015 0.60 4295 92 15361 1
fisher-2 0.00 0.60 4221 140 15140 1
robx=0.48
fisher-2 0.025 0.60 4367 198 15479 0
fisher-2 0.020 0.60 4367 204 15479 0
fisher-2 0.015 0.60 4367 182 15479 0
So your expectation is not true for me. Interesting enough I
get significantly *more* FNs.
pi
More information about the Bogofilter
mailing list