training on errors only
Greg Louis
glouis at dynamicro.on.ca
Sat Nov 30 21:16:26 CET 2002
On 20021130 (Sat) at 1112:39 -0500, Greg Louis wrote:
> The Robinson-Fisher method still has the advantage -- it can achieve
> 2.1% false positives and 4.5% false negatives in this test -- but it
> was the evaluation algorithm used in the training. To be fair (this is
> why this report is just preliminary), I need to run the test separately
> for each algorithm, training with that algorithm.
Well, unless I'm doing something wrong, the chain rule method likes to
be symmetrical. That is, if you allow it (with my data) to make around
2-3 percent false positives, it'll make around 2-3 percent false
negatives, as we saw both yesterday and today; but if you try to limit
the false positives by adjusting a cutoff level, it doesn't at all want
to play that game and the false negatives go through the roof.
Trained on errors only, Robinson-Fisher is quite sweet: it allows me to
raise the spam_cutoff to the point where it's seeing 0.6% false
positives, and it then lets just under 10% of spam through. I can't do
this to the Bayesian chain rule:
calc run fneg percent
1 chi 0 145 10.24
2 chi 1 119 8.40
3 chi 2 139 9.82
4 bcr-pw 0 508 35.88
5 bcr-pw 1 533 37.64
6 bcr-pw 2 556 39.27
7 bcr-fw 0 405 28.60
8 bcr-fw 1 422 29.80
9 bcr-fw 2 429 30.30
Here, each algorithm used a training database created with that
algorithm, trained on errors only. The spam cutoffs were chosen such
that about 12 false positives were reported per run. This was quite
disastrous for Bayes chain, as you can see.
Does this mean Robinson-Fisher is better than the Bayes chain rule?
No, we haven't yet proof of that; it just seems that this method of
tuning to balance false positives against false negatives isn't
available with the latter method. If I abandon all pretense at tuning
and let the algorithms all do what comes naturally, things look like
this (cutoff = 0.5); the err column is just the sum of false positives
and false negatives, and the percent column reports err as a
percentage.
calc run fpos fneg err percent
1 chi 0 54 56 110 7.77
2 chi 1 36 40 76 5.37
3 chi 2 41 55 96 6.78
4 bcr-pw 0 47 152 199 14.05
5 bcr-pw 1 38 180 218 15.40
6 bcr-pw 2 45 172 217 15.32
7 bcr-fw 0 92 78 170 12.01
8 bcr-fw 1 76 94 170 12.01
9 bcr-fw 2 76 105 181 12.78
Summarizing,
calc meanerrpc lcl95 ucl95
1 chi 6.64 5.1 8.18
2 bcr-pw 14.92 13.4 16.47
3 bcr-fw 12.26 10.7 13.81
At this point, therefore, it looks as though chi does win. I'll write
this up more formally and put it up on my website; the url will be
http://www.bgl.nu/~glouis/bogofilter/BcrFisher.html when I get it done,
but that will probably be some time tomorrow afternoon. I'll also post
the relevant bits of code in case anybody can see mistakes.
--
| G r e g L o u i s | gpg public key: |
| http://www.bgl.nu/~glouis | finger greg at bgl.nu |
More information about the Bogofilter
mailing list