massive disk space leak vs thresh_update

Matthias Andree matthias.andree at gmx.de
Sun Dec 12 19:25:24 CET 2004


Tom Allison <tallison at tacocat.net> writes:

> I would rather suggest addressing the potential problem inherent with 
> the database.  You have to routinely cleanup the logs and manage the 
> database environment much more than you have in the past.  Thats the 
> nature of the beast.

I have thought about this a bit, and I think we should offer a mode
where bogofilter automatically clears the lock files right away. Either
we do this automatically per "time last run", or we offer a special
bogofilter mode that the user can stuff into her crontab where it is run
regularly. I'd also considered adding a "verify", or "check" mode, where
bogoutil could go through the database files; and reimplement some parts
of the utilities in bogoutil, because they're usually only on the order
of 3...30 lines of code - that saves the user the hassle of finding and
installing the utilities and allows us to wrap proper locking so the
user doesn't need to take care to stop her mail system.

> The option would be to disable transactions
> entirely and use something like procmail lock_files to manage things.

Care has been applied to make sure that all bf* and bogo* utilities
don't get in the way of each other even if run at the same
time. procmail's "local lock files" (i. e. using :0 OPTS:name or
:0 OPTS: rather than just :0 OPTS - note the colon) should NEVER be
necessary for a *bogofilter* "recipe" of any kind - these are by nature
non-delivering "recipes" (as bogofilter calls them).

> Now, if you turn on '-u' and thresh_update you should greatly reduce the 
> amount of writing you do to the database anyways.  Personally, I only 
> train on error now and since this is done on a periodic cronjob, I don't 
> need transactional support at all.  There's only one writer at a time.

That must be a misunderstanding. Transactional support provides
fully-fledged atomicity, consistency, isolation and durability
guarantees. You said you don't need isolation (you're getting atomicity
halfway because the writer will lock out all readers), because you run
the updates serially, but "no transactions" also means "no durability,
no consistency and no full atomicity"

> I've also added a crontab for 'db_archive -d' to clean up those logs.  I 
> suppose this could be better refined to: db_verify && db_archive to keep 
> things from getting really ugly.

With transactional support, db_verify should never be necessary.
OK, db_verify might fail after a crash, and would then tell you "some
component of your system, kernel, file system, disk drive, didn't meet
our expectations".

-- 
Matthias Andree



More information about the Bogofilter mailing list