Understanding tuning results

David Relson relson at osagesoftware.com
Thu Jun 5 16:44:41 CEST 2003


At 10:30 AM 6/5/03, Boris 'pi' Piwinger wrote:
>Hi!
>
>Using the new scripts for tuning I got some results:
>
> >           r0     r1     r2
> > sp.mc   2142   2142   2142
> > ns.mc   3713   3714   3713
> >
> > Top 10 results
> > 06/05 14:45:43 1      0.025 fpos...0 at cutoff 0.999999, 
> run0...394  run1...341  run2...377 1112
> > 06/05 14:46:29 1      0.050 fpos...0 at cutoff 0.999997, 
> run0...359  run1...315  run2...345 1019
> > 06/05 14:47:15 1      0.075 fpos...0 at cutoff 0.999997, 
> run0...347  run1...307  run2...334  988
> > 06/05 14:47:58 1      0.100 fpos...0 at cutoff 0.999999, 
> run0...308  run1...282  run2...292  882
> > 06/05 14:48:45 1      0.125 fpos...0 at cutoff 0.999998, 
> run0...294  run1...268  run2...279  841
> > 06/05 14:49:28 1      0.150 fpos...0 at cutoff 0.999998, 
> run0...281  run1...264  run2...270  815
> > 06/05 14:50:14 1      0.175 fpos...0 at cutoff 0.999997, 
> run0...281  run1...262  run2...267  810
> > 06/05 14:51:05 1      0.200 fpos...0 at cutoff 0.999991, 
> run0...252  run1...248  run2...249  749
> > 06/05 14:51:52 1      0.225 fpos...0 at cutoff 0.999968, 
> run0...238  run1...233  run2...235  706
> > 06/05 14:52:37 1      0.250 fpos...0 at cutoff 0.999861, 
> run0...230  run1...225  run2...231  686

There's something wrong here.  The last column should be in ascending order 
and should include the 10 result lines (printed earlier) with the lowest 
totals.  When I run the script, the order is correct.  Yours is wrong.

Can you send me the complete results.MMDD.HHMM.txt file?

>I don't really understandt those. What is the number (1)
>behind the time? What the next number?

Remember the tests are for the robs and min_dev parameters.  Those are the 
two numbers after the time.  The first line of the "Top 10 Results" is 
where the desired numbers should be.

>Do I really have to go by time and look the values up above?
>If so, the best would be:
>robx        = 0.415000 (4.15e-01)
>robs        = 1.000000 (1.00e+00)
>min_dev     = 0.100000 (1.00e-01)
>cutoff 0.999861
>
>OK, let me do the following. I take the r[0-2].(ns|sp) and
>chech what happens using my real database:
>
>[The config I use now]
>algorithm=fisher
>robs=0.0011
>min_dev=0.025
>ham_cutoff = 0.00
>spam_cutoff = 0.53
>spamicity_tags = Spam, Ham
>spamicity_formats = %0.3f, %0.3f
>header_format = %h: %c, spamicity=%p, version=%v/%a
>bogofilter_dir=/usr/local/pi/bogolists/.bogofilter
>
>Spam:
>    6424 test.spam
>False negatives:
>170
>Ham:
>   11133 test.ham
>False positives:
>1
>
>
>[The settings suggested above]

As mentioned above, it looks like the "Top 10" section is showing the worst 
results (rather than the best).  Look in the results file for the lowest 
value in the last column and try that for your test.


>Now that is a real pain. Something is awfully wrong here.
>
>BTW: My above setting "in production" show no mistake
>whatsoever in the last three days or so.
>
>pi





More information about the Bogofilter mailing list