Testing fisher

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Wed Jan 29 11:49:34 CET 2003


David Relson wrote:

> On the other hand, _you_ are using ROBX=0.415 for unknown words, but with 
> min_dev=0.025 giving a discard range of 0.475 to 0.525, those unknown words 
> are contributing to the message's spam score.  I'm wondering what effect 
> that has on your test results.
> 
> Observation: "ROBX < EVEN_ODDS - min_dev" includes hammish words in the 
> message score.
> Hypothesis:  This contributes to the false negative count.
> Experiment:  Change ROBX so that it's closer to EVEN_ODDS, say 0.48, and 
> then rerun your test with min_dev=0.15, 0.20, and 0.25.
> Expected result:  0.15 will have more false negatives than 0.25 (due to 
> including/excluding unknown words).


algorithm    min_def    spam_cutoff    test.spam    test.ham
                                       total  F-N   total F-P
robx=0.415
fisher-2        0.10          0.95     4186   364   15140  1
fisher-2        0.25          0.60     4335   191   15362  0
fisher-2        0.20          0.60     4221   184   15140  0
fisher-2        0.15          0.60     4237   170   15251  0
fisher-2        0.10          0.60     4221   139   15140  1
fisher-2        0.075         0.60     4237   132   15251  1
fisher-2        0.05          0.60     4237   116   15251  1
fisher-2        0.035         0.60     4262   101   15251  1
fisher-2        0.025         0.60     4262    89   15251  1
fisher-2        0.02          0.60     4297    92   15362  1
fisher-2        0.015         0.60     4295    92   15361  1
fisher-2        0.00          0.60     4221   140   15140  1

robx=0.48
fisher-2        0.025         0.60     4367   198   15479  0
fisher-2        0.020         0.60     4367   204   15479  0
fisher-2        0.015         0.60     4367   182   15479  0

So your expectation is not true for me. Interesting enough I
get significantly *more* FNs.

pi





More information about the Bogofilter mailing list