db maintenance "delete oldest least used tokens, but maintain count of x"

Tom Allison tallison at tacocat.net
Wed Mar 17 13:37:47 CET 2004


Boris 'pi' Piwinger wrote:
> David Relson wrote:
> 
> 
>>>Having looked (but not posted), it appears as though the MSG_COUNT was
>>>used to evaluate the individual spamicity of the token only, and hence
>>>dropping tokens (without changing their associated spam/ham counts)
>>>should be safe. This all providing that I haven't missed a reference
>>>to the MSG_COUNT.
>>
>>That's correct. AFAICT removing unwanted tokens from the wordlist is OK.
>>It has the obvious effects - smaller wordlist and tokens becoming
>>unknown.  It doesn't affect the remaining tokens.  Of course if someone
>>delete the wrong tokens, the effects will be serious.
> 
> 
> The danger seems to be that once the tokens return, there
> values will be horribly wrong.
> 

True.
But I was thinking of something along the lines of
-a 300 -c 2
should remove everything > 300 days and < 2 counts total.
If I haven't seen it in almost a year, I don't think the effect of it 
being horribly wrong will matter.  After all, shouldn't it still be 
~robx at this point?





More information about the Bogofilter mailing list