spam cutoff less than neutral?

Tue Feb 24 09:03:01 CET 2004

Tom Anderson <tanderso at oac-design.com> wrote:

>On Mon, 2004-02-23 at 11:16, Boris 'pi' Piwinger wrote:
>> > However, since 0.5 should theoretically be "unsure",
>> 
>> I don't subscribe to this point of view. I am not claiming
>
>Being the mean between spam (1.0) and ham (0.0), it ought to be exactly
>neutral.  

What does that word mean? .5 means that the two tests
explained in the man page give the same result.

What happens really depends on a lot of settings. With robx
and robs you can move values for single tokens which in
effect then moves those test results.

The way you trian does have an effect on those values as
well. I can easily set my spam_cutoff by training to
exhaustion. Probably at extreme values I would get into
trouble, but that's it.

>When used as a standard of proof, this is what the Bayesian
>method would suggest as well.  

We are not doing Bayes.

>> > implications.  This is particularly true if I move spam_cutoff too close
>> > to robx. 
>> 
>> I have that almost the same (I could probably make it
>> strictly the same, they differ by .001.
>
>The entire point of robx is to bias new words as ham... to give them the
>benefit of the doubt.  If your cutoff is at or near robx, you're
>essentially saying that heretofore unseen words contribute nothing
>toward the spamicity, or even in fact bias as spam.  

Right. In fact they don't contribute for me, they are well
within the min_dev interval.
http://piology.org/bogofilter/.bogofilter.cf

>This can only serve
>to weaken your database/classifications if in fact you receive ham
>messages with new words.  

I cannot see that. There ain't no such thing as message wie
new words only.

>> > False positives are unacceptable, and heretofore unseen emails
>> > need the benefit of the doubt.  Already my spam_cutoff is less than
>> > min_dev, which itself seems somewhat hypocritical.
>> 
>> I don't understand that.
>
>You don't understand that false positives are unacceptable, 

I do, I did not understand what spam_cutoff less than
mind_dev should mean.

>Since I assume the first two to be
>self-explanatory, the reason I believe having a spam_cutoff less than
>min_dev is hypocritical is because min_dev is defined as the range from
>0.5 at which words are too neutral to be considered toward the
>classification.  

So most likely you min_dev is below spam_cutoff. What are
your values.

>If the total message scores within that range, then the
>message itself ought to be considered too neutral to be considered as
>either ham or spam.  

That is decided with robx and spam_cutoff, not min_dev (only
indirectly).

>Cutoffs by definition ought to be at or outside of
>the min_dev range.  

Not at all.

>Else, min_dev should really be changed to be
>consistent with your cutoff philosophy.

It is absolutely consistent. I still don't get you point.

pi