Understanding tuning results
David Relson
relson at osagesoftware.com
Thu Jun 5 16:44:41 CEST 2003
At 10:30 AM 6/5/03, Boris 'pi' Piwinger wrote:
>Hi!
>
>Using the new scripts for tuning I got some results:
>
> > r0 r1 r2
> > sp.mc 2142 2142 2142
> > ns.mc 3713 3714 3713
> >
> > Top 10 results
> > 06/05 14:45:43 1 0.025 fpos...0 at cutoff 0.999999,
> run0...394 run1...341 run2...377 1112
> > 06/05 14:46:29 1 0.050 fpos...0 at cutoff 0.999997,
> run0...359 run1...315 run2...345 1019
> > 06/05 14:47:15 1 0.075 fpos...0 at cutoff 0.999997,
> run0...347 run1...307 run2...334 988
> > 06/05 14:47:58 1 0.100 fpos...0 at cutoff 0.999999,
> run0...308 run1...282 run2...292 882
> > 06/05 14:48:45 1 0.125 fpos...0 at cutoff 0.999998,
> run0...294 run1...268 run2...279 841
> > 06/05 14:49:28 1 0.150 fpos...0 at cutoff 0.999998,
> run0...281 run1...264 run2...270 815
> > 06/05 14:50:14 1 0.175 fpos...0 at cutoff 0.999997,
> run0...281 run1...262 run2...267 810
> > 06/05 14:51:05 1 0.200 fpos...0 at cutoff 0.999991,
> run0...252 run1...248 run2...249 749
> > 06/05 14:51:52 1 0.225 fpos...0 at cutoff 0.999968,
> run0...238 run1...233 run2...235 706
> > 06/05 14:52:37 1 0.250 fpos...0 at cutoff 0.999861,
> run0...230 run1...225 run2...231 686
There's something wrong here. The last column should be in ascending order
and should include the 10 result lines (printed earlier) with the lowest
totals. When I run the script, the order is correct. Yours is wrong.
Can you send me the complete results.MMDD.HHMM.txt file?
>I don't really understandt those. What is the number (1)
>behind the time? What the next number?
Remember the tests are for the robs and min_dev parameters. Those are the
two numbers after the time. The first line of the "Top 10 Results" is
where the desired numbers should be.
>Do I really have to go by time and look the values up above?
>If so, the best would be:
>robx = 0.415000 (4.15e-01)
>robs = 1.000000 (1.00e+00)
>min_dev = 0.100000 (1.00e-01)
>cutoff 0.999861
>
>OK, let me do the following. I take the r[0-2].(ns|sp) and
>chech what happens using my real database:
>
>[The config I use now]
>algorithm=fisher
>robs=0.0011
>min_dev=0.025
>ham_cutoff = 0.00
>spam_cutoff = 0.53
>spamicity_tags = Spam, Ham
>spamicity_formats = %0.3f, %0.3f
>header_format = %h: %c, spamicity=%p, version=%v/%a
>bogofilter_dir=/usr/local/pi/bogolists/.bogofilter
>
>Spam:
> 6424 test.spam
>False negatives:
>170
>Ham:
> 11133 test.ham
>False positives:
>1
>
>
>[The settings suggested above]
As mentioned above, it looks like the "Top 10" section is showing the worst
results (rather than the best). Look in the results file for the lowest
value in the last column and try that for your test.
>Now that is a real pain. Something is awfully wrong here.
>
>BTW: My above setting "in production" show no mistake
>whatsoever in the last three days or so.
>
>pi
More information about the Bogofilter
mailing list