massive disk space leak vs thresh_update
David Relson
relson at osagesoftware.com
Mon Dec 13 02:00:05 CET 2004
On 12 Dec 2004 19:52:07 -0500
Tom Anderson wrote:
> On Fri, 2004-12-10 at 23:46, David Relson wrote:
>
> > As thresh_update only affects folks using '-u' and as it has
> > distinct benefits, I've been thinking that "thresh_update=0.01"
> > should become part of bogofilter's default configuration.
>
> I think that's a fine idea. I've been using -u since the beginning,
> and I've been using thresh_update to good effect since I upgraded to
> 0.92.8. I don't see that it harms even early training as highly
> polarized emails are fairly insignificant for further training anyway.
> With the new transactional version, it would seem that setting a
> default thresh_update would be a prudent move to prevent ill effects
> for newbies.
>
> > My first thought is to default thresh_update to 0.01. However the
> > default spam_cutoff is 0.99 and the two factors combined would block
> > autoupdating of spam (but not ham). A thresh_update of 0.005 should
> > work.
>
> I can't see that initially not updating on spam would be a terrible
> idea. In all likelihood, people will be training spams on error more
> than hams anyway, so it should balance out. Eventually they will set
> their spam_cutoff lower, at which point the thresh_update will provide
> autoupdating of spams too, unless they consciously change it.
Interesting thought. With the defaults of spam_cutoff=0.99,
ham_cutoff=0.10, thresh_update=0.01, autoupdate will only update with
ham and spam registration will only happen through manual action. It's
lopsided, which is a new and different situation, about which I'll need
to think...
> > A second thought is to suggest adding a cron job to run
> > db_checkpoint and/or db_archive. People who don't want logfiles
> > using lots of disk space won't want to save the logfiles, so letting
> > Berkeley DB delete them is reasonable.
>
> Is this necessarily a dichotomy? Why not do both of these things?
> Anything which is commonly requested or implemented should be
> suggested early in the docs.
'Tis definitely safe to change the default thresh_update and suggest the
cron job. No conflict there!
Regards,
David
More information about the Bogofilter
mailing list