New tuning.sh

Mon Jun 16 12:56:14 CEST 2003

Boris 'pi' Piwinger wrote:

> Top 10 results:
>  robs   min_dev spam_cutoff  run0 run1 run2 total
> 0.0100   0.450    0.500000    35   32   32    99
> 0.1000   0.425    0.500000    43   44   46   133
> 0.1000   0.450    0.500000    41   48   51   140
> 0.0320   0.450    0.501000    48   55   56   159
> 0.0320   0.400    0.500000    55   56   53   164
> 0.0320   0.375    0.500000    62   64   67   193
> 0.1000   0.400    0.500000    59   68   67   194
> 0.0320   0.425    0.503000    64   70   68   202
> 0.0100   0.425    0.501000    64   70   70   204
> 0.3200   0.450    0.550000    82   83   93   258
> 
> I used target=6 so results have 1 or 2 fp. Actually, those
> in the top10 only have one.

With target=3:

 robs   min_dev spam_cutoff  run0 run1 run2 total
0.1000   0.450    0.501000    51   59   66   176
0.1000   0.425    0.525000    70   82   81   233
0.0320   0.425    0.528000    73   85   88   246
0.0320   0.450    0.681000    94   96   92   282
0.3200   0.450    0.607000    91   94  101   286
0.0320   0.400    0.511000    94   97  104   295
0.0100   0.425    0.537000    99   97  106   302
0.0100   0.450    0.685000   104  110  104   318
0.1000   0.400    0.545000   106  109  111   326
0.0100   0.400    0.517000   106  104  119   329

These are all with 0 fp, but the result list has runs with
up to two. Now this is quite a difference in results.

Using 0.5 seems to risky to me according to earlier tests.
So I currently use .501 as a spam_cuttoff. With min_dev=.45
robs should be .1 or .01 according to the tuning results. So
I test with my current in-production-database and my
complete training archive (which is bigger):

robs=0.01
spam_cutoff=0.501
Spam:
  13702 test.spam
False negatives:
43
Ham:
  22387 test.ham
False positives:
0

robs=0.1
spam_cutoff=0.501
Spam:
  13702 test.spam
False negatives:
100
Ham:
  22387 test.ham
False positives:
0

robs=0.01
spam_cutoff=0.5
Spam:
  13702 test.spam
False negatives:
16
Ham:
  22387 test.ham
False positives:
2

robs=0.1
spam_cutoff=0.5
Spam:
  13702 test.spam
False negatives:
82
Ham:
  22387 test.ham
False positives:
2

So I will stay with robs=0.01 and spam_cutoff=0.501.

pi