troublesome false negative

Greg Louis glouis at dynamicro.on.ca
Mon Nov 4 12:21:31 CET 2002


On 20021103 (Sun) at 2231:46 -0500, David Relson wrote:

> So, if a message contains lots of words not seen previously, or seen 
> rarely, there'll be lots of words with low spamicity.  This will color the 
> classification and result in a low spamicity message.  One of the 
> differences between Graham and Robinson is that Graham compares 
> goodcount+spamcount to MINIMUM_FREQ, while Robinson doesn't have this 
> check.

Robinson's paper explains why, if you use the f(w) calculation, this
check is not needed.  This is what the s and x parameters are about. 
Remember, x (which I currently have set to 0.415 and you to 0.200) is
the probability that an unknown word will receive, and s determines the
weight of x with respect to the actual count when the actual count is
low.


-- 
| G r e g  L o u i s          | gpg public key:      |
|   http://www.bgl.nu/~glouis |   finger greg at bgl.nu |




More information about the Bogofilter mailing list