Incorrigible spam

Tue Apr 13 13:48:49 CEST 2004

On 13 Apr 2004 07:32:11 -0400
Tom Anderson wrote:

> On Tue, 2004-04-13 at 01:14, Tom Anderson wrote:
> > > I frequently correct three spam and then get 3-7 ham that start 
> > > reporting unsure.  By iterating over the known body repeatedly I
> > > negate the effects of certain key phrases.
> > 
> > This appears nearly impossible with my wordlist.  I've never had a
> > ham score higher than 0.15.
> 
> I was just thinking about this again, and while I don't doubt your
> observation, something else must be influencing your result.  Think
> about this:
> 
> 1) you register three successive hams
> 2) you register one spam 5-10 times until it is sufficiently spammy
> 
> Now, is it possible that any of your previous three hams is no longer
> hammy?  For this to be true, it would need to be composed almost
> entirely of the tokens in the spam.  And if this were the case, then
> it couldn't have scored as ham with only one registration, right?  
> 
> Successively registering these hams and spams until they each score
> correctly will polarize the difference while neutralizing the
> intersection.  This is precisely what we would want to achieve.

pi has mentionned effects like that.  After a train-on-error pass,
additional passes will show "errors" that weren't in the original pass. 
Adding tokens to the wordlist does effect previous scores.  In most
cases the effect is very, very small.