db maintenance "delete oldest least used tokens, but maintain count of x"

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Sat Mar 13 19:25:02 CET 2004


Tom Allison <tallison at tacocat.net> wrote:

>I think the way to manage this is to maintain an archive of at least 
>2000 ham and 2000 spam as determined by your current bogofilter system. 
>  And to keep only the most recent 2000 of each.
>
>This would allow you to build a database based on the most recent trends 
>of spam content.

This technique can also be used with training-to-exhaustion.
But there is no need to rebuild the database, just do any
retraining only with recent messages. Works surprisingly
well. Also you don't lose possibly important older
information. Since there is not this size problem, there is
no need to delete the database.

pi




More information about the Bogofilter mailing list