training on errors only

Sat Nov 30 21:16:26 CET 2002

On 20021130 (Sat) at 1112:39 -0500, Greg Louis wrote:

> The Robinson-Fisher method still has the advantage -- it can achieve
> 2.1% false positives and 4.5% false negatives in this test -- but it
> was the evaluation algorithm used in the training.  To be fair (this is
> why this report is just preliminary), I need to run the test separately
> for each algorithm, training with that algorithm.

Well, unless I'm doing something wrong, the chain rule method likes to
be symmetrical.  That is, if you allow it (with my data) to make around
2-3 percent false positives, it'll make around 2-3 percent false
negatives, as we saw both yesterday and today; but if you try to limit
the false positives by adjusting a cutoff level, it doesn't at all want
to play that game and the false negatives go through the roof.

Trained on errors only, Robinson-Fisher is quite sweet: it allows me to
raise the spam_cutoff to the point where it's seeing 0.6% false
positives, and it then lets just under 10% of spam through.  I can't do
this to the Bayesian chain rule:

    calc run fneg percent
1    chi   0  145   10.24
2    chi   1  119    8.40
3    chi   2  139    9.82
4 bcr-pw   0  508   35.88
5 bcr-pw   1  533   37.64
6 bcr-pw   2  556   39.27
7 bcr-fw   0  405   28.60
8 bcr-fw   1  422   29.80
9 bcr-fw   2  429   30.30

Here, each algorithm used a training database created with that
algorithm, trained on errors only.  The spam cutoffs were chosen such
that about 12 false positives were reported per run.  This was quite
disastrous for Bayes chain, as you can see.

Does this mean Robinson-Fisher is better than the Bayes chain rule? 
No, we haven't yet proof of that; it just seems that this method of
tuning to balance false positives against false negatives isn't
available with the latter method.  If I abandon all pretense at tuning
and let the algorithms all do what comes naturally, things look like
this (cutoff = 0.5); the err column is just the sum of false positives
and false negatives, and the percent column reports err as a
percentage.

    calc run fpos fneg err percent
1    chi   0   54   56 110    7.77
2    chi   1   36   40  76    5.37
3    chi   2   41   55  96    6.78
4 bcr-pw   0   47  152 199   14.05
5 bcr-pw   1   38  180 218   15.40
6 bcr-pw   2   45  172 217   15.32
7 bcr-fw   0   92   78 170   12.01
8 bcr-fw   1   76   94 170   12.01
9 bcr-fw   2   76  105 181   12.78

Summarizing,
    calc meanerrpc lcl95 ucl95
1    chi      6.64   5.1  8.18
2 bcr-pw     14.92  13.4 16.47
3 bcr-fw     12.26  10.7 13.81

At this point, therefore, it looks as though chi does win.  I'll write
this up more formally and put it up on my website; the url will be
http://www.bgl.nu/~glouis/bogofilter/BcrFisher.html when I get it done,
but that will probably be some time tomorrow afternoon.  I'll also post
the relevant bits of code in case anybody can see mistakes.

-- 
| G r e g  L o u i s          | gpg public key:      |
|   http://www.bgl.nu/~glouis |   finger greg at bgl.nu |