flaw in spamicity calculation

Eric Seppanen eds at reric.net
Wed Sep 18 22:01:51 CEST 2002


On Wed, Sep 18, 2002 at 03:37:23PM -0400, David Relson wrote:
> 
> Methinks it's time to take a serious look at some alternate ideas for 
> converting individual word probabilities into a messages spamicity.

How about this: maybe we should stop capping the per-word spamicity at 
0.01 and 0.99.  Let the uncapped values determine what words make it into 
the extrema array.  Then, before running the Bayes-like combination on the 
extrema, cap the values as before, or apply a curve to the values to get 
them to fit.

The capping is needed because unless you change the Bayes-like 
calculation, any values very near 0 very near or greater than 1 will 
override all the other values.

For summay digest subscription: bogofilter-digest-subscribe at aotto.com



More information about the Bogofilter mailing list