New version
Boris 'pi' Piwinger
3.14 at logic.univie.ac.at
Tue Mar 16 18:31:08 CET 2004
Greg Louis wrote:
>> I'd much rather get it as unsure, and at least have a chance to
>> register it as spam once. Therefore, the robx ought to be less than the
>> spam_cutoff
>
> Sorry but this betrays a fundamental misconception on your part. The
> values of x and the spam cutoff are not to be compared in that way,
> because they are not linearly related _at_all_. Remember, the score
> that the spam cutoff is compared against is calculated by Fisher's
> method of combining probabilities, not the old Robinson geometric-mean
> thing; a message consisting of ten tokens with fw of 0.532 (smaller
> than the spam_cutoff, although not much so) would still score 0.5637.
> The value of robx is supposed to be a guess at how likely it is that an
> unknown token is to be found in spam. In my message corpus, that
> likelihood really is around 0.6, so that's what the prior should be.
To save Tom here;-) If you have a message with no
significant token whatsoever, than they are directly compared.
pi
More information about the Bogofilter
mailing list