troublesome false negative
Greg Louis
glouis at dynamicro.on.ca
Mon Nov 4 12:21:31 CET 2002
On 20021103 (Sun) at 2231:46 -0500, David Relson wrote:
> So, if a message contains lots of words not seen previously, or seen
> rarely, there'll be lots of words with low spamicity. This will color the
> classification and result in a low spamicity message. One of the
> differences between Graham and Robinson is that Graham compares
> goodcount+spamcount to MINIMUM_FREQ, while Robinson doesn't have this
> check.
Robinson's paper explains why, if you use the f(w) calculation, this
check is not needed. This is what the s and x parameters are about.
Remember, x (which I currently have set to 0.415 and you to 0.200) is
the probability that an unknown word will receive, and s determines the
weight of x with respect to the actual count when the actual count is
low.
--
| G r e g L o u i s | gpg public key: |
| http://www.bgl.nu/~glouis | finger greg at bgl.nu |
More information about the Bogofilter
mailing list