default parameters - new vs old vs mine
Tom Anderson
tanderso at oac-design.com
Tue Mar 30 16:04:32 CEST 2004
On Mon, 2004-03-29 at 20:19, David Relson wrote:
> Parameters:
> robs robx min_dev spam_co ham_co
> old 0.010000 0.415000 0.100000 0.950000 0.100000
> new-0.99 0.017800 0.520000 0.375000 0.990000 0.450000
> new-0.90 0.017800 0.520000 0.375000 0.900000 0.450000
> new-0.70 0.017800 0.520000 0.375000 0.700000 0.450000
> mine 0.017800 0.549138 0.435000 0.501000 0.376000
>
> Classification Accuracy:
> ver hh hu hs sh su ss
> old 88673 650 7 0 604 74313
> new-0.99 88965 362 3 2 850 74065
> new-0.90 88965 362 3 2 549 74366
> new-0.70 88965 357 7 2 427 74588
> mine 88955 359 16 4 274 74639
>
Indeed, the false positives seem to be reduced from the "old" and "mine"
values. It's hard to believe that 4 hams scored over 0.7 though, and 3
over 0.99. I haven't had one yet score over 0.15, even those of the
commercial variety (buy.com, cnet, bankrate, etc.). Are you sure you
classified them correctly? I'd love to see what such a spammy ham looks
like.
The ham unsures are way too high for my taste. My "hu" rate is less
than 0.03% (where I've assumed 1 hu out of my total unsures because I
can't divide zero meaningfully) to your 30-60% of total unsures. My
numbers skew way in favor of false negatives, where yours are fairly
balanced. I couldn't possibly accept a false positive rate higher than
a false negative rate, even on the order of 0.01% vs 0.0025%. I'd much
rather reverse that ratio. On the other hand, I receive 1-2 fn's per
day, and 8-10 su's, but IMHO that's more than worth zero fp's and hu's.
Just off-hand, I would suggest decreasing robx and increasing robs to
better bias it. But that's just based on my experience.
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20040330/0ad3cb08/attachment.sig>
More information about the Bogofilter
mailing list