New version
Greg Louis
glouis at dynamicro.on.ca
Tue Mar 16 18:55:54 CET 2004
On 20040316 (Tue) at 1831:08 +0100, Boris 'pi' Piwinger wrote:
> Greg Louis wrote:
>
> >> I'd much rather get it as unsure, and at least have a chance to
> >> register it as spam once. Therefore, the robx ought to be less than the
> >> spam_cutoff
> >
> > Sorry but this betrays a fundamental misconception on your part. The
> > values of x and the spam cutoff are not to be compared in that way,
> > because they are not linearly related _at_all_. Remember, the score
> > that the spam cutoff is compared against is calculated by Fisher's
> > method of combining probabilities, not the old Robinson geometric-mean
> > thing; a message consisting of ten tokens with fw of 0.532 (smaller
> > than the spam_cutoff, although not much so) would still score 0.5637.
> > The value of robx is supposed to be a guess at how likely it is that an
> > unknown token is to be found in spam. In my message corpus, that
> > likelihood really is around 0.6, so that's what the prior should be.
>
> To save Tom here;-) If you have a message with no
> significant token whatsoever, than they are directly compared.
>
That's only true if every token's fw is within min_dev of 0.5. If you
have any unknowns and x is outside 0.5 +/- mindev, it's not true. But
yes, if you want an even worse straw man than Tom's all-unknowns
message ;) an all-0.5 message will be scored at robx and (in my case)
classed as spam.
--
| G r e g L o u i s | gpg public key: 0x400B1AA86D9E3E64 |
| http://www.bgl.nu/~glouis | (on my website or any keyserver) |
| http://wecanstopspam.org in signatures helps fight junk email. |
More information about the Bogofilter
mailing list