spam cutoff less than neutral?

David Relson relson at osagesoftware.com
Mon Feb 23 15:29:20 CET 2004


On 23 Feb 2004 09:19:27 -0500
Tom Anderson wrote:

> robx=0.455
> robs=0.1
> min_dev=0.25
> spam_cutoff=0.55
> ham_cutoff=0.15
> 
> I'm receiving zero false positives (awesome).  I receive on average
> maybe one false negative per day (pretty good).  Upwards of 60 spams
> are successfully classified (nice).  But, I get 20-30 unsures which
> are almost always (>99%) spam.  I use -u, and correct all errors and
> unsures.  Many unsures are of the type I previously identified on this
> list... they are very long and contain many normal English words,
> generally scoring 0.5 and not moving much from that after repeated
> training.
> 
> Given these numbers, I'm tempted to move my spam_cutoff even further
> down.  However, since 0.5 should theoretically be "unsure", I'm
> hesitant to move the spam_cutoff much further due to the philosophical
> implications.  This is particularly true if I move spam_cutoff too
> close to robx.  False positives are unacceptable, and heretofore
> unseen emails need the benefit of the doubt.  Already my spam_cutoff
> is less than min_dev, which itself seems somewhat hypocritical.
> 
> Should I keep my spam_cutoff as is and just continue correcting
> unsures?  Or is it safe to move it into "unsure" territory?  Does
> anyone else have a very low spam_cutoff?  Does it produce any false
> positives? Can tweaking the other numbers push more of these unsures
> into the spam territory without moving spam_cutoff?
> 
> Tom

Tom,

Divide your unsures into two groups - ham and spam.  Look at the highest
scoring ham message.  To avoid false positives, your spam_cutoff needs
to be higher than that message's score.  Similarly you can raise your
ham_cutoff value and lower the unsure count.

Based on bogotune results, I'm presently using 

    ham_cutoff=0.376
    spam_cutoff=0.501

I'm getting several unsures a day.  Compared to several hundred spam
daily, it's not a big deal.

David




More information about the Bogofilter mailing list