Spam / ham registration issue

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Wed Mar 3 14:33:00 CET 2004


Tig wrote:

> Thanks heaps for the reply people. My understanding now is: Each word
> in the test case has been registered as spam and ham, so therefore
> balance out and give a neutral result. It does not matter how many
> times a word is registered as spam or ham, just the fact that it has
> been recorded as either or both.
> 
> Would this be a correct summary?

No, the key point here is that the tokens in question have
been found in *every* single ham and *every* single spam, so
we have pgood and pbad both equal to one.

Increasing n will only make the overall result approach .5
(so Tom was wrong that it would ever become a significant
token, it becomes more and more insignificant).

pi




More information about the Bogofilter mailing list