New version

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Tue Mar 16 18:31:08 CET 2004


Greg Louis wrote:

>>  I'd much rather get it as unsure, and at least have a chance to
>> register it as spam once.  Therefore, the robx ought to be less than the
>> spam_cutoff
> 
> Sorry but this betrays a fundamental misconception on your part.  The
> values of x and the spam cutoff are not to be compared in that way,
> because they are not linearly related _at_all_.  Remember, the score
> that the spam cutoff is compared against is calculated by Fisher's
> method of combining probabilities, not the old Robinson geometric-mean
> thing; a message consisting of ten tokens with fw of 0.532 (smaller
> than the spam_cutoff, although not much so) would still score 0.5637.
> The value of robx is supposed to be a guess at how likely it is that an
> unknown token is to be found in spam.  In my message corpus, that
> likelihood really is around 0.6, so that's what the prior should be.

To save Tom here;-) If you have a message with no
significant token whatsoever, than they are directly compared.

pi




More information about the Bogofilter mailing list