db maintenance "delete oldest least used tokens, but maintain count of x"

Matthias Andree matthias.andree at gmx.de
Fri Mar 5 01:51:58 CET 2004


On Thu, 04 Mar 2004, Chris Fortune wrote:

> Anybody have any ideas how to implement this db maintenance rule?:
> 
> "delete oldest least used tokens, but maintain count of x data rows", x being 100,000.
> 
> It would be good to keep balance of spammy : hammy tokens?

I wonder how all these radical DB maintenance functions are going to
adjust the .MSG_COUNT value at some reasonable value. People are
bothered about the ham:spam token ratio, but the more important
token count:mail count ratio isn't questioned.

Did I miss research that resulted in "you can twist .MSG_COUNT or the
data base all you will without adverse effects"? I believe not, and
that's the reason why I have never used maintenance functions like
these.

I wonder if we should record the token frequency or timestamp tokens
after read, just for fun, so someone can put a size limit and implement
some LRU strategy. Needless to say it would be an awfully slow
performer.

-- 
Matthias Andree

Encrypt your mail: my GnuPG key ID is 0x052E7D95




More information about the Bogofilter mailing list