[long] Recovery handling in TXN branch

Matthias Andree matthias.andree at gmx.de
Sun Aug 15 00:25:47 CEST 2004


On Sat, 14 Aug 2004, David Relson wrote:

> The BerkeleyDB background page at
> http://www.sleepycat.com/docs/ref/transapp/app.html seems oriented
> towards long-running processes, i.e. processes for which a big startup
> cost (like running recovery every time) is acceptable.  Since bogofilter
> may run multiple times per second that approach doesn't seem a good fit.
> 
> Your locking ideas are comprehensive and complex.  Being complex
> increases the chances of problems.  Something simpler may well be
> appropriate.

I'd hope so and have thought about that but haven't come up with
anything simpler yet, unfortunately.

> Most bogofilter runs are for message scoring which only requires
> database reading.

> Writing the database when registering messages is the
> unusual case.  It seems that bogofilter needs a solution that protects
> those writes.

The distinction doesn't work for "Concurrent" and "Transactional" data
bases. These place read locks on certain regions of the file, pages for
instance. If these read locks that aren't cleared through orderly
shutdown, a subsequente process can get into a non-detectable deadlock.

> As you know my mail server runs postfix and procmail.  /etc/procmailrc
> uses procmail's file locking when it runs "bogofilter -u" and this has
> been reliable.  I wonder if we can do something comparable for
> transaction locking.  Here's an idea:
> 
> Create a bogolock program that gets the global bogofilter lock, then
> forks to bogofilter or bogoutil, and waits for completion.  If the
> forked process is successful, all is well and the lock is released.  If
> the forked process fails, then bogolock can take the appropriate
> recovery actions.

The recovery action involves killing all processes accessing the same
environment, then running recovery and then restarting the process
(although this is implicit if the mail delivery agent -
maildrop or procmail - maps the error exit from bogofilter into
EX_TEMPFAIL -- the mail transfer agent will try again later).

The other difficulty is that we don't have a global lock - locking is
handled by libdb and is rather fine-grained, two processes can happily
update tokens in different pages of the data base as the same time, the
only part that gets serialized is usually the message count token.

And we still need to handle a crash of the bogolock program. Its
premature termination MUST trigger a data base recovery.

Practically, such deadlock situations as I have observed have always
happened across as hasty reboot so I think this is the common cause for
such deadlocks.

> Years ago when I was coding telecom daemons that needed to run 24x7
> non-stop, my first effort involved a watchdog process.  Occasionally the
> watchdog ended up killing a live daemon and getting all tangles up.  The
> final, working solution was having the program fork with the forked
> process doing all the work and the parent process waiting for the death
> of the forked process so it could be restarted.  Having the simple
> control level worked and worked well.

I know it will work well, but I'm still unclear how the requirements can
be integrated. If the system crashes, the parent process is also gone
and can take no further action, so the bogolock parent would have to
save state information in a way that allows to detect crashes.

I'll sleep about this a bit and hope to see clearer WRT integration
later.

> Another approach is running bogofilter as a daemon.  This would be a
> client-server design, as has been suggested in the past, and would have
> the advantage of the bogofilter daemon running 24x7 which would run
> faster since database opening wouldn't be needed for every message.

I wonder if inter-process communication is really more efficient than an
open() syscall.

> Cached reading would also be faster.

I doubt that. Caching is pretty much in the hands of the OS itself and
operating systems no matter if free or commercial are developed with
data base workloads in mind. The only question is if the stuff is in
buffer or page cache, if the OS in question.

> The server could be bogofilter
> itself or it could be a database layer used by bogofilter and bogoutil
> or the server.  Again, fork() could be used to control the daemon to
> keep it running.

BerkeleyDB has a RPC (remote procedure call) interface, I have more than
once wondered how efficient that would be. I fear context switch
overhead here.

-- 
Matthias Andree

NOTE YOU WILL NOT RECEIVE MY MAIL IF YOU'RE USING SPF!
Encrypted mail welcome: my GnuPG key ID is 0x052E7D95 (PGP/MIME preferred)



More information about the bogofilter-dev mailing list