a source of useful info and a question (correction)

Greg Louis glouis at dynamicro.on.ca
Mon Nov 18 19:41:58 CET 2002


On 20021118 (Mon) at 1322:39 -0500, Greg Louis wrote:
> 
> Both geometric-mean and Fisher-based calculation use
>   scalefactor = badlist_messagecount / goodlist_messagecount
> in bogofilter.  In spambayes, if the goodlist has more tokens,
>   scalefactor = badlist_tokencount / goodlist_tokencount)
> and otherwise
>   scalefactor = goodlist_tokencount / badlist_tokencount
> 
> We calculate, for each token from w = 1 to n where n is the number of
> unique tokens in the message,
>   f(w) = (s * x + badcount) / (s + badcount + goodcount * scalefactor)
> and spambayes does something essentially the same.

Of course, if the badlist has more tokens in spambayes, the f(w)
denominator is (s + badcount * scalefactor + goodcount)

-- 
| G r e g  L o u i s          | gpg public key:      |
|   http://www.bgl.nu/~glouis |   finger greg at bgl.nu |




More information about the Bogofilter mailing list