default parameters - new vs old vs mine

Matt Christian mattc at visi.com
Tue Mar 30 07:23:14 CEST 2004


David Relson <relson at osagesoftware.com> writes:

> Greetings All,
>
> FWIW, I decided to compare bogofilter's "old" parameter set (as
> presently used in all versions of bogofilter) to the new parameters (as
> found by Greg's mondo/huge bogotune run).  For good measure, I also
> included the parameters that I'm currently using on my site (from a
> bogotune run, with some minor (manual) changes).

David, thanks for running these different parameters and reporting on
the results.  I believe that I have found discrepancies with some of the
numbers.

> [...] The message counts are 89,330 ham and 74,917 spam.

89330+74917 = 164247 total

> [...] Using the results of the "new" parameters, I was able to see
> that lowering spam_cutoff from 0.99 to 0.90 would not affect the
> number of false positives and that lowering it to 0.70 would duplicate
> the fp counts for the "old" parameters.  In the tables below, I've
> included the counts for these 2 additional values of spam_cutoff.
>
> The "accuracy" table shows how many ham were classified as ham, as
> unsure, and as spam, as well as how mnay spam were classified as ham, as
> unsure, and as spam.  A perfect score would have entries only in the
> "hh" (ham scored as ham) and "ss" (spam scored as spam) columns.
>
> Here are the results:
>
> Parameters:
>            robs     robx    min_dev  spam_co  ham_co
> old      0.010000 0.415000 0.100000 0.950000 0.100000
> new-0.99 0.017800 0.520000 0.375000 0.990000 0.450000
> new-0.90 0.017800 0.520000 0.375000 0.900000 0.450000
> new-0.70 0.017800 0.520000 0.375000 0.700000 0.450000
> mine     0.017800 0.549138 0.435000 0.501000 0.376000
>  
> Classification Accuracy:
> ver          hh     hu     hs     sh     su     ss
> old       88673    650      7      0    604  74313

88673+650+7+0+604+74313 = 164247  OK matches total

> new-0.99  88965    362      3      2    850  74065

88965+362+3+2+850+74065 = 164247  OK matches total

> new-0.90  88965    362      3      2    549  74366

88965+362+3+2+549+74366 = 164247  OK matches total

> new-0.70  88965    357      7      2    427  74588

88965+357+7+2+427+74588 = 164346  WRONG does not match

> mine      88955    359     16      4    274  74639

88955+359+16+4+274+74369 = 163977  WRONG does not match

Did some numbers get mixed up between new-0.70 and mine?

> My main purpose in doing this was to see how well the new parameters
> compare to the old parameters.  Offhand, I'd say they look good and are
> eminently usable.

I would agree, the new parameters look great.

> What does this all mean?  That's hard to say.  It can be observed that
> the new parameters give the lowest number of false positives, but also
> give more unsures.  It can also be observed that my local parameters
> give many more false positives, though the rate is only 1 in 5,000.
> [...] 

FYI I was using the old (default) parameters up until today.  I am now
using the new-0.90 parameters for my setup.  Running bogofilter 0.17.4.

Thanks,

Matt

-- 
Matt Christian  mattc at visi.com  Learn to love and love to learn.
http://www.visi.com/~mattc/ 0111 ftp://ftp.visi.com/users/mattc/
5468652073656372657420697320131b331b2e1b311b341b311b351b39110d0a




More information about the Bogofilter mailing list