evaluating possible new options

Fri May 16 12:38:40 CEST 2003

On 20030516 (Fri) at 0951:48 +1000, michael at optusnet.com.au wrote:
> Greg Louis <glouis at dynamicro.on.ca> writes:
> > summary(aov(pc ~ fold + head + html + fold*head + fold*html +
> > +   head*html + fold*head*html, data=parms))
> >                Df   Sum Sq  Mean Sq  F value    Pr(>F)    
> > fold            1 0.038226 0.038226  26.5486 0.0008716 ***
> > head            1 0.296242 0.296242 205.7430 5.448e-07 ***
> > html            1 0.002262 0.002262   1.5709 0.2454608    
> > fold:head       1 0.061685 0.061685  42.8410 0.0001794 ***
> > fold:html       1 0.001369 0.001369   0.9504 0.3581594    
> > head:html       1 0.000251 0.000251   0.1743 0.6872818    
> > fold:head:html  1 0.000251 0.000251   0.1746 0.6870339    
> > Residuals       8 0.011519 0.001440                       
> 
> A run from my corpus of 84875 spam and 48079 hams. Method
> used was to randomly divide into 4 equal blocks, then
> in turn, use one block to train and the measure against
> that block and the other three.

We don't usually test the same messages as are used to train, in these
kinds of experiments; it complicates the analysis unless the results
are discarded.  But it's interesting to see how big a difference it
makes!

> 
> Default bogofilter 0.12.3 with subject tagging turned on:
> $ perl ./out-crunch out
> CONFIG : Mindev 0.100, RobX 0.415
>          0 against 0   --> false pos     0 false neg  1425
>          0 against 1   --> false pos     0 false neg  4049
>          0 against 2   --> false pos     0 false neg  3977
>          0 against 3   --> false pos     0 false neg  3863
>          1 against 0   --> false pos     0 false neg  3770
>          1 against 1   --> false pos     0 false neg  1468
>          1 against 2   --> false pos     0 false neg  3873
>          1 against 3   --> false pos     0 false neg  3812
>          2 against 0   --> false pos     0 false neg  3859
>          2 against 1   --> false pos     0 false neg  3977
>          2 against 2   --> false pos     0 false neg  1467
>          2 against 3   --> false pos     0 false neg  3829
>          3 against 0   --> false pos     0 false neg  3923
>          3 against 1   --> false pos     0 false neg  4026
>          3 against 2   --> false pos     0 false neg  4026
>          3 against 3   --> false pos     0 false neg  1505
> 
> Then the same data with latest CVS bogofilter with -Puh
> flag. (i.e. turning off case folding).
> 
> [root at genconf73 db]# perl ./out-crunch out.1
> CONFIG : Mindev 0.100, RobX 0.415
>          0 against 0   --> false pos     0 false neg  1172
>          0 against 1   --> false pos     0 false neg  3283
>          0 against 2   --> false pos     0 false neg  3196
>          0 against 3   --> false pos     0 false neg  3105
>          1 against 0   --> false pos     3 false neg  3123
>          1 against 1   --> false pos     0 false neg  1166
>          1 against 2   --> false pos     2 false neg  3175
>          1 against 3   --> false pos     1 false neg  3042
>          2 against 0   --> false pos     1 false neg  3204
>          2 against 1   --> false pos     0 false neg  3304
>          2 against 2   --> false pos     0 false neg  1189
>          2 against 3   --> false pos     0 false neg  3149
>          3 against 0   --> false pos     1 false neg  3191
>          3 against 1   --> false pos     2 false neg  3285
>          3 against 2   --> false pos     3 false neg  3282
>          3 against 3   --> false pos     0 false neg  1208
> 
> As you can see, there's been a jump in false positives.

Adjusting the spam cutoff to eliminate those is the best way to get a
true comparison between the two runs.  I can always get fewer false
negatives at the expense of more false positives, just by twiddling the
spam cutoff.  Since enabling these parameters tends to skew the
distribution of message scores, one needs to eliminate that effect in
order to be sure one's seeing a real change in the error rate.

> The good news though is the huge drop in false negatives.  This is an
> average drop from 15.6% to 12.7% of total spam volume (or a nearly 20%
> drop in the spam getting through).

Although fixing the fn count will reduce this, it's likely you'd still
see a 10-15% drop in the spam that passes; that would be in line with
all but one of the experiments David and I have done.

-- 
| G r e g  L o u i s          | gpg public key: finger     |
|   http://www.bgl.nu/~glouis |   glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |