Fw: Wordlist too big

Thomas Anderson tanderso at oac-design.com
Thu Aug 9 14:12:43 CEST 2007


On Thu, 2007-08-09 at 09:25 +0200, Matthias Andree wrote:
> On Wed, 08 Aug 2007, David Relson forwarded this message:
> > The question is: The wordlist.db is coming too big and growing fast,
> > about 50Mb per week. It's already 500 Mb. I would like to know if there
> > is any limitation about it, recomendation or anything, so it doesn't
> > affect my performance.
> 
> If using the "-u" option, try without -- and note you'll probably have
> to adjust training scripts (turn -Ns into -s and turn -Sn into -n).

Or you could just do nothing and let it play out.  The growth of your
wordlist should approximate a logistic function... that is, it will grow
exponentially at first, and then logarithmically as you begin to exhaust
the possible token space.  Like Thomas Malthus' unfounded fears about
human population growth, your concern about a rapidly growing wordlist
is likely unfounded.  It should begin to rapidly slow its growth.
Undermining bogofilter's ability to update the wordlist with appropriate
tokens will only hamper accuracy.  That said, it couldn't hurt to purge
old hapaxes occasionally, and using thresh_update to slow wordlist
growth once accuracy is high might also serve you well.

Tom





More information about the Bogofilter mailing list