db maintenance "delete oldest least used tokens, but maintain count of x"
Tom Allison
tallison at tacocat.net
Wed Mar 17 13:37:47 CET 2004
Boris 'pi' Piwinger wrote:
> David Relson wrote:
>
>
>>>Having looked (but not posted), it appears as though the MSG_COUNT was
>>>used to evaluate the individual spamicity of the token only, and hence
>>>dropping tokens (without changing their associated spam/ham counts)
>>>should be safe. This all providing that I haven't missed a reference
>>>to the MSG_COUNT.
>>
>>That's correct. AFAICT removing unwanted tokens from the wordlist is OK.
>>It has the obvious effects - smaller wordlist and tokens becoming
>>unknown. It doesn't affect the remaining tokens. Of course if someone
>>delete the wrong tokens, the effects will be serious.
>
>
> The danger seems to be that once the tokens return, there
> values will be horribly wrong.
>
True.
But I was thinking of something along the lines of
-a 300 -c 2
should remove everything > 300 days and < 2 counts total.
If I haven't seen it in almost a year, I don't think the effect of it
being horribly wrong will matter. After all, shouldn't it still be
~robx at this point?
More information about the Bogofilter
mailing list