db maintenance "delete oldest least used tokens, but maintain count of x"
Matthias Andree
matthias.andree at gmx.de
Fri Mar 5 01:51:58 CET 2004
On Thu, 04 Mar 2004, Chris Fortune wrote:
> Anybody have any ideas how to implement this db maintenance rule?:
>
> "delete oldest least used tokens, but maintain count of x data rows", x being 100,000.
>
> It would be good to keep balance of spammy : hammy tokens?
I wonder how all these radical DB maintenance functions are going to
adjust the .MSG_COUNT value at some reasonable value. People are
bothered about the ham:spam token ratio, but the more important
token count:mail count ratio isn't questioned.
Did I miss research that resulted in "you can twist .MSG_COUNT or the
data base all you will without adverse effects"? I believe not, and
that's the reason why I have never used maintenance functions like
these.
I wonder if we should record the token frequency or timestamp tokens
after read, just for fun, so someone can put a size limit and implement
some LRU strategy. Needless to say it would be an awfully slow
performer.
--
Matthias Andree
Encrypt your mail: my GnuPG key ID is 0x052E7D95
More information about the Bogofilter
mailing list