chi wins

Wed Nov 27 13:51:25 CET 2002

Well, unless you do find a bug in my code, it seems the Bayesian chain
rule with f(w) is just a bit weaker than chi (Robinson-Fisher, as it's
known in bogofilter); with straight p(w), it's a lot weaker.  The table
shows mean false negatives as a percent, with 95% confidence limits,
for an experiment of the usual comparison type, with spam cutoffs set to
give the same number of false positives for each method:

  calc meanfnpc lcl95 ucl95
1  chi     8.31  7.59  9.04
2 bcrp    15.55 14.82 16.28
3 bcrf    10.02  9.30 10.75

The differences are statistically significant, at the 0.05 level.
The attached .png file is a graph showing these results.

It would be possible to run a larger test; I would suggest dropping the
straight p(w) calculation so we can concentrate on comparing chi with
bcr using f(w) for both.

-- 
| G r e g  L o u i s          | gpg public key:      |
|   http://www.bgl.nu/~glouis |   finger greg at bgl.nu |
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bcr.png
Type: image/png
Size: 4102 bytes
Desc: not available
URL: <https://www.bogofilter.org/pipermail/bogofilter/attachments/20021127/9072a083/attachment.png>