default parameters - new vs old vs mine

Tom Anderson tanderso at oac-design.com
Tue Mar 30 16:04:32 CEST 2004


On Mon, 2004-03-29 at 20:19, David Relson wrote:
> Parameters:
>            robs     robx    min_dev  spam_co  ham_co
> old      0.010000 0.415000 0.100000 0.950000 0.100000
> new-0.99 0.017800 0.520000 0.375000 0.990000 0.450000
> new-0.90 0.017800 0.520000 0.375000 0.900000 0.450000
> new-0.70 0.017800 0.520000 0.375000 0.700000 0.450000
> mine     0.017800 0.549138 0.435000 0.501000 0.376000
>  
> Classification Accuracy:
> ver          hh     hu     hs     sh     su     ss
> old       88673    650      7      0    604  74313
> new-0.99  88965    362      3      2    850  74065
> new-0.90  88965    362      3      2    549  74366
> new-0.70  88965    357      7      2    427  74588
> mine      88955    359     16      4    274  74639
> 

Indeed, the false positives seem to be reduced from the "old" and "mine"
values.  It's hard to believe that 4 hams scored over 0.7 though, and 3
over 0.99.  I haven't had one yet score over 0.15, even those of the
commercial variety (buy.com, cnet, bankrate, etc.).  Are you sure you
classified them correctly?  I'd love to see what such a spammy ham looks
like.

The ham unsures are way too high for my taste.  My "hu" rate is less
than 0.03% (where I've assumed 1 hu out of my total unsures because I
can't divide zero meaningfully) to your 30-60% of total unsures.  My
numbers skew way in favor of false negatives, where yours are fairly
balanced.  I couldn't possibly accept a false positive rate higher than
a false negative rate, even on the order of 0.01% vs 0.0025%.  I'd much
rather reverse that ratio.  On the other hand, I receive 1-2 fn's per
day, and 8-10 su's, but IMHO that's more than worth zero fp's and hu's.

Just off-hand, I would suggest decreasing robx and increasing robs to
better bias it.  But that's just based on my experience.

Tom

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20040330/0ad3cb08/attachment.sig>


More information about the Bogofilter mailing list