massive disk space leak vs thresh_update

David Relson relson at osagesoftware.com
Sat Dec 11 05:46:29 CET 2004


Greetings,

Debian bug #284452 brings up a problem of high disk usage when
using'-u' (autoupdate) with 0.93.x versions of bogofilter.  When traffic
is high, '-u' causes many wordlist updates and that causes lots of
logfiles to be created.

One solution is to use the thresh_update option with '-u'.  A
setting of thresh_update=0.01 means that low scoring messages, i.e.
scores from 0.00 to 0.01, and high scoring messages (from 0.99 to
1.00) aren't automagically added to the wordlist.

I started using thresh_update=0.01 in January 2004.  The growth rate
of my database has slowed dramatically and the scoring accuracy has
not been affected.  Since installing 0.93.0 on Nov 11, my site has
been processing about 1,000 messages per day.  During this month
long period, only 3 logfiles (totalling 23MB) have been created.

As thresh_update only affects folks using '-u' and as it has distinct
benefits, I've been thinking that "thresh_update=0.01" should become
part of bogofilter's default configuration.

What do y'all think?

David

P.S.  If you want to view the complete Debian bug report, it's at
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=284452



More information about the Bogofilter mailing list