chi wins
Greg Louis
glouis at dynamicro.on.ca
Wed Nov 27 13:51:25 CET 2002
Well, unless you do find a bug in my code, it seems the Bayesian chain
rule with f(w) is just a bit weaker than chi (Robinson-Fisher, as it's
known in bogofilter); with straight p(w), it's a lot weaker. The table
shows mean false negatives as a percent, with 95% confidence limits,
for an experiment of the usual comparison type, with spam cutoffs set to
give the same number of false positives for each method:
calc meanfnpc lcl95 ucl95
1 chi 8.31 7.59 9.04
2 bcrp 15.55 14.82 16.28
3 bcrf 10.02 9.30 10.75
The differences are statistically significant, at the 0.05 level.
The attached .png file is a graph showing these results.
It would be possible to run a larger test; I would suggest dropping the
straight p(w) calculation so we can concentrate on comparing chi with
bcr using f(w) for both.
--
| G r e g L o u i s | gpg public key: |
| http://www.bgl.nu/~glouis | finger greg at bgl.nu |
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bcr.png
Type: image/png
Size: 4102 bytes
Desc: not available
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20021127/9072a083/attachment.png>
More information about the Bogofilter
mailing list