massive disk space leak vs thresh_update

David Relson relson at osagesoftware.com
Mon Dec 13 02:00:05 CET 2004


On 12 Dec 2004 19:52:07 -0500
Tom Anderson wrote:

> On Fri, 2004-12-10 at 23:46, David Relson wrote:
> 
> > As thresh_update only affects folks using '-u' and as it has
> > distinct benefits, I've been thinking that "thresh_update=0.01"
> > should become part of bogofilter's default configuration.
> 
> I think that's a fine idea.  I've been using -u since the beginning,
> and I've been using thresh_update to good effect since I upgraded to
> 0.92.8.  I don't see that it harms even early training as highly
> polarized emails are fairly insignificant for further training anyway.

> With the new transactional version, it would seem that setting a
> default thresh_update would be a prudent move to prevent ill effects
> for newbies.
> 
> > My first thought is to default thresh_update to 0.01.  However the
> > default spam_cutoff is 0.99 and the two factors combined would block
> > autoupdating of spam (but not ham).  A thresh_update of 0.005 should
> > work.
> 
> I can't see that initially not updating on spam would be a terrible
> idea.  In all likelihood, people will be training spams on error more
> than hams anyway, so it should balance out.  Eventually they will set
> their spam_cutoff lower, at which point the thresh_update will provide
> autoupdating of spams too, unless they consciously change it.

Interesting thought.  With the defaults of spam_cutoff=0.99,
ham_cutoff=0.10, thresh_update=0.01, autoupdate will only update with
ham and spam registration will only happen through manual action.  It's
lopsided, which is a new and different situation, about which I'll need
to think...

> > A second thought is to suggest adding a cron job to run
> > db_checkpoint and/or db_archive.  People who don't want logfiles
> > using lots of disk space won't want to save the logfiles, so letting
> > Berkeley DB delete them is reasonable.
> 
> Is this necessarily a dichotomy?  Why not do both of these things? 
> Anything which is commonly requested or implemented should be
> suggested early in the docs.

'Tis definitely safe to change the default thresh_update and suggest the
cron job.  No conflict there!

Regards,

David



More information about the Bogofilter mailing list