massive disk space leak vs thresh_update

Tom Anderson tanderso at oac-design.com
Mon Dec 13 01:52:07 CET 2004


On Fri, 2004-12-10 at 23:46, David Relson wrote:

> As thresh_update only affects folks using '-u' and as it has distinct
> benefits, I've been thinking that "thresh_update=0.01" should become
> part of bogofilter's default configuration.

I think that's a fine idea.  I've been using -u since the beginning, and
I've been using thresh_update to good effect since I upgraded to
0.92.8.  I don't see that it harms even early training as highly
polarized emails are fairly insignificant for further training anyway. 
With the new transactional version, it would seem that setting a default
thresh_update would be a prudent move to prevent ill effects for
newbies.

> My first thought is to default thresh_update to 0.01.  However the
> default spam_cutoff is 0.99 and the two factors combined would block
> autoupdating of spam (but not ham).  A thresh_update of 0.005 should
> work.

I can't see that initially not updating on spam would be a terrible idea.  In all likelihood, people will be training spams on error more than hams anyway, so it should balance out.  Eventually they will set their spam_cutoff lower, at which point the thresh_update will provide autoupdating of spams too, unless they consciously change it.

> A second thought is to suggest adding a cron job to run db_checkpoint
> and/or db_archive.  People who don't want logfiles using lots of disk
> space won't want to save the logfiles, so letting Berkeley DB delete
> them is reasonable.

Is this necessarily a dichotomy?  Why not do both of these things?  Anything which is commonly requested or implemented should be suggested early in the docs.

Tom





More information about the Bogofilter mailing list