default parameters - new vs old vs mine
Matt Christian
mattc at visi.com
Tue Mar 30 07:23:14 CEST 2004
David Relson <relson at osagesoftware.com> writes:
> Greetings All,
>
> FWIW, I decided to compare bogofilter's "old" parameter set (as
> presently used in all versions of bogofilter) to the new parameters (as
> found by Greg's mondo/huge bogotune run). For good measure, I also
> included the parameters that I'm currently using on my site (from a
> bogotune run, with some minor (manual) changes).
David, thanks for running these different parameters and reporting on
the results. I believe that I have found discrepancies with some of the
numbers.
> [...] The message counts are 89,330 ham and 74,917 spam.
89330+74917 = 164247 total
> [...] Using the results of the "new" parameters, I was able to see
> that lowering spam_cutoff from 0.99 to 0.90 would not affect the
> number of false positives and that lowering it to 0.70 would duplicate
> the fp counts for the "old" parameters. In the tables below, I've
> included the counts for these 2 additional values of spam_cutoff.
>
> The "accuracy" table shows how many ham were classified as ham, as
> unsure, and as spam, as well as how mnay spam were classified as ham, as
> unsure, and as spam. A perfect score would have entries only in the
> "hh" (ham scored as ham) and "ss" (spam scored as spam) columns.
>
> Here are the results:
>
> Parameters:
> robs robx min_dev spam_co ham_co
> old 0.010000 0.415000 0.100000 0.950000 0.100000
> new-0.99 0.017800 0.520000 0.375000 0.990000 0.450000
> new-0.90 0.017800 0.520000 0.375000 0.900000 0.450000
> new-0.70 0.017800 0.520000 0.375000 0.700000 0.450000
> mine 0.017800 0.549138 0.435000 0.501000 0.376000
>
> Classification Accuracy:
> ver hh hu hs sh su ss
> old 88673 650 7 0 604 74313
88673+650+7+0+604+74313 = 164247 OK matches total
> new-0.99 88965 362 3 2 850 74065
88965+362+3+2+850+74065 = 164247 OK matches total
> new-0.90 88965 362 3 2 549 74366
88965+362+3+2+549+74366 = 164247 OK matches total
> new-0.70 88965 357 7 2 427 74588
88965+357+7+2+427+74588 = 164346 WRONG does not match
> mine 88955 359 16 4 274 74639
88955+359+16+4+274+74369 = 163977 WRONG does not match
Did some numbers get mixed up between new-0.70 and mine?
> My main purpose in doing this was to see how well the new parameters
> compare to the old parameters. Offhand, I'd say they look good and are
> eminently usable.
I would agree, the new parameters look great.
> What does this all mean? That's hard to say. It can be observed that
> the new parameters give the lowest number of false positives, but also
> give more unsures. It can also be observed that my local parameters
> give many more false positives, though the rate is only 1 in 5,000.
> [...]
FYI I was using the old (default) parameters up until today. I am now
using the new-0.90 parameters for my setup. Running bogofilter 0.17.4.
Thanks,
Matt
--
Matt Christian mattc at visi.com Learn to love and love to learn.
http://www.visi.com/~mattc/ 0111 ftp://ftp.visi.com/users/mattc/
5468652073656372657420697320131b331b2e1b311b341b311b351b39110d0a
More information about the Bogofilter
mailing list