massive disk space leak vs thresh_update

David Relson relson at osagesoftware.com
Sat Dec 11 13:10:55 CET 2004


On Sat, 11 Dec 2004 11:48:18 +0100
Matthias Andree wrote:

> David Relson <relson at osagesoftware.com> writes:
> 
> > As thresh_update only affects folks using '-u' and as it has
> > distinct benefits, I've been thinking that "thresh_update=0.01"
> > should become part of bogofilter's default configuration.
> >
> > What do y'all think?
> 
> I think we should disable -u for the nonce until we have solid data on
> the "learning" that -u is supposed to do. It is not clear that this
> option actually does what we want and can easily be emulated from
> procmail or maildrop for those who still want it.

Matthias,

No.  Disabling '-u' is a code change that would force me to run a
patched version of bogofilter and I'm unwilling to do that.

Using a non-zero value of thresh_update has a significant
effect on disk usage.  It has a mid-level effect on wordlist.db
size and a major effect on logfiles.

My first thought is to default thresh_update to 0.01.  However the
default spam_cutoff is 0.99 and the two factors combined would block
autoupdating of spam (but not ham).  A thresh_update of 0.005 should
work.

A second thought is to suggest adding a cron job to run db_checkpoint
and/or db_archive.  People who don't want logfiles using lots of disk
space won't want to save the logfiles, so letting Berkeley DB delete
them is reasonable.

Regards,

David



More information about the Bogofilter mailing list