db maintenance "delete oldest least used tokens, but maintain count of x"

Tom Allison tallison at tacocat.net
Thu Mar 11 12:16:08 CET 2004


Greg Louis wrote:
> I would agree with this, yes.  And if performance needs dictate a
> culling of the training db, I think the right way to do that is to
> chuck it out and build a new one from current spam and nonspam (of
> course this is only practical if you get a fairly large amount of mail,
> but if you don't, you're not likely to suffer performance problems
> anyhow).
> 

I think the way to manage this is to maintain an archive of at least 
2000 ham and 2000 spam as determined by your current bogofilter system. 
  And to keep only the most recent 2000 of each.

This would allow you to build a database based on the most recent trends 
of spam content.





More information about the Bogofilter mailing list