massive disk space leak vs thresh_update
Tom Anderson
tanderso at oac-design.com
Mon Dec 13 01:52:07 CET 2004
On Fri, 2004-12-10 at 23:46, David Relson wrote:
> As thresh_update only affects folks using '-u' and as it has distinct
> benefits, I've been thinking that "thresh_update=0.01" should become
> part of bogofilter's default configuration.
I think that's a fine idea. I've been using -u since the beginning, and
I've been using thresh_update to good effect since I upgraded to
0.92.8. I don't see that it harms even early training as highly
polarized emails are fairly insignificant for further training anyway.
With the new transactional version, it would seem that setting a default
thresh_update would be a prudent move to prevent ill effects for
newbies.
> My first thought is to default thresh_update to 0.01. However the
> default spam_cutoff is 0.99 and the two factors combined would block
> autoupdating of spam (but not ham). A thresh_update of 0.005 should
> work.
I can't see that initially not updating on spam would be a terrible idea. In all likelihood, people will be training spams on error more than hams anyway, so it should balance out. Eventually they will set their spam_cutoff lower, at which point the thresh_update will provide autoupdating of spams too, unless they consciously change it.
> A second thought is to suggest adding a cron job to run db_checkpoint
> and/or db_archive. People who don't want logfiles using lots of disk
> space won't want to save the logfiles, so letting Berkeley DB delete
> them is reasonable.
Is this necessarily a dichotomy? Why not do both of these things? Anything which is commonly requested or implemented should be suggested early in the docs.
Tom
More information about the Bogofilter
mailing list