spam cutoff less than neutral?

Tom Anderson tanderso at oac-design.com
Tue Feb 24 08:43:16 CET 2004


On Mon, 2004-02-23 at 11:16, Boris 'pi' Piwinger wrote:
> > However, since 0.5 should theoretically be "unsure",
> 
> I don't subscribe to this point of view. I am not claiming

Being the mean between spam (1.0) and ham (0.0), it ought to be exactly
neutral.  When used as a standard of proof, this is what the Bayesian
method would suggest as well.  I understand that we are doing quite a
bit to messages to skew them in various ways, but if this is the case, I
think that classifications ought to be skewed back toward 1.0 if 0.5 is
considered spam.  Just to be philosophically correct.  Pragmatism may
prevail in the end, but it makes less sense.

> > implications.  This is particularly true if I move spam_cutoff too close
> > to robx. 
> 
> I have that almost the same (I could probably make it
> strictly the same, they differ by .001.

The entire point of robx is to bias new words as ham... to give them the
benefit of the doubt.  If your cutoff is at or near robx, you're
essentially saying that heretofore unseen words contribute nothing
toward the spamicity, or even in fact bias as spam.  This can only serve
to weaken your database/classifications if in fact you receive ham
messages with new words.  You tempt false positives.

> > False positives are unacceptable, and heretofore unseen emails
> > need the benefit of the doubt.  Already my spam_cutoff is less than
> > min_dev, which itself seems somewhat hypocritical.
> 
> I don't understand that.

You don't understand that false positives are unacceptable, that new
words require the benefit of the doubt, or that a spam_cutoff less than
min_dev is hypocritical?  Since I assume the first two to be
self-explanatory, the reason I believe having a spam_cutoff less than
min_dev is hypocritical is because min_dev is defined as the range from
0.5 at which words are too neutral to be considered toward the
classification.  If the total message scores within that range, then the
message itself ought to be considered too neutral to be considered as
either ham or spam.  Cutoffs by definition ought to be at or outside of
the min_dev range.  Else, min_dev should really be changed to be
consistent with your cutoff philosophy.

Tom

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20040224/c7ce8744/attachment.sig>


More information about the Bogofilter mailing list