tuning.sh [was: bogofilter-0.13.6.3 - new current release]

Fri Jun 20 17:48:56 CEST 2003

Greg Louis <glouis at dynamicro.on.ca> wrote:

>> > The rule of thumb is that $target should be 0.1% to 0.3% of the test set 
>> > size.
>> 
>> For me that would be more than 70 (using .2%). Or do you
>> only count half the size since the other half is used to
>> build the database? Anyhow this would still be way to big.
>> 
>YET AGAIN ONCE MORE ANOTHER (and last) TIME:
>
>You need more false positives to run a parameter scan than you want to
>have in production.  

I recall you said it should lead to something where the test
finds a cutoff not to close to .5 or 1. That failed for 12
or 24. For 3:

 robs   min_dev spam_cutoff  run0 run1 run2 total
0.1000    0.450   0.501000    60   62   64   186
0.0320    0.425   0.524000    86   97   81   264
0.0320    0.450   0.682000   100   93   88   281
0.0100    0.425   0.515000    96  101   87   284
0.1000    0.425   0.573000    91  102   91   284
0.3200    0.450   0.625000    98  112  100   310
0.0100    0.450   0.686000   111  108  103   322
0.0100    0.400   0.515000   112  123  111   346
0.0320    0.400   0.535000   115  125  113   353
0.0100    0.375   0.521000   119  126  125   370

>David is quite right: if you use tuning.sh you
>should set the target somewhere between 0.1% and 0.3% of the total size
>of your test files (r?.ns).  

I see, of a single such file! That gives 7.5 for 0.2%.

>Your own results show that 12 is too low,

Contradicting the above.

pi