db maintenance "delete oldest least used tokens, but maintain
count of x"
Boris 'pi' Piwinger
3.14 at logic.univie.ac.at
Wed Mar 17 07:23:02 EST 2004
David Relson wrote:
>> Having looked (but not posted), it appears as though the MSG_COUNT was
>> used to evaluate the individual spamicity of the token only, and hence
>> dropping tokens (without changing their associated spam/ham counts)
>> should be safe. This all providing that I haven't missed a reference
>> to the MSG_COUNT.
> That's correct. AFAICT removing unwanted tokens from the wordlist is OK.
> It has the obvious effects - smaller wordlist and tokens becoming
> unknown. It doesn't affect the remaining tokens. Of course if someone
> delete the wrong tokens, the effects will be serious.
The danger seems to be that once the tokens return, there
values will be horribly wrong.
More information about the Bogofilter