[long] Recovery handling in TXN branch

Sat Aug 14 13:42:47 CEST 2004

David,
dear readers,

the TXN code is, as you may be aware of, not yet complete, it still
lacks code to

- detect when recovery of the data base environment is needed
- perform this recovery

Practically, this causes bogofilter process lockups a while after a
bogofilter, bogoutil or similar crashes or is aborted by a signal while
holding a lock.

The background information is at
http://www.sleepycat.com/docs/ref/transapp/app.html

That document suggests two ways to handle the problem:

1. create a "parent" process that spawns the child processes

2. use a "watcher" or "sentinel" process that DB users need to register
   their PID with.

Problem with 1:
we would have to pass the file descriptors to that parent's child
process, and Wietse Venema found that Linux has bugs WRT file descriptor
passing in Postfix, and I don't think I understand UNIX well enough to
dare this. This would require three processes, the one that was run, the
supervisor parent process and its child. Not really close to what we're
doing today. A simpler version could just act as a proxy, run three
asynchronous sendfile() or mmap()|read()-and-write() loops but that
would become inefficient

Problem with 2:
on busy and particularly the faster and more secure servers, the watcher
process has no reliable way of checking whether the PID has been
reassigned by the system to a different process, scenario:

bogofilter registers PID #12345 with the watcher
#12345 crashes or is killed by a signal
OS reassigns PID #12345 to, for instance, an apache httpd
watcher checks for the child and thinks it's still alive.

This is a real-world problem.

Conclusion: we need to find our own solution in bogofilter unless we
want a "run recovery on every process start" model.

The problems we have are:

#1 detect an unclean exit
    a - in a life system QUICKLY (ok, a minute should suffice)
    b - after a system crash

#2 in case of an unclean exit, abort all other bogofilter processes
   accessing the same data base
   (DB_ENV->set_flags(env, ...DB_PANIC_ENVIRONMENT) should work)

#3 have a lock mechanism to make sure only one process can run recovery
   at a time

#1 is IMHO the most complex matter.

My current idea is to add a subdirectory "process-control" under the
bogodir, with mode 0700 and containing these files:

watcher-lock    [persistent]
need-recovery   [transient]
dbuser.$PID     [transient]
dbuser.$PID.new [transient]

Upon startup, every bogofilter process forks a watcher process and waits
for its exit.

The watcher first scans for any dbuser.$PID file it can obtain a lock
on. If it finds one, a process has exited uncleanly (see below), the
watcher creates the need-recovery file, unlinks the dbuser.$PID file and
sets the "panic" flag in the environment, and then kills all processes
that have a lock on the db* files and removes their dbuser.$PID files.
The processes should probably catch the signal used, run db_txn_abort()
and exit somewhat gracefully.

It then detaches itself from the parent and tries to create and lock the
watcher-lock file exclusively (non-blocking). If the lock is denied,
some other watcher is running, so this watcher can exit immediately. If
the lock is granted, the watcher detaches itself and starts supervising
(see below).

Then, the bogofilter checks if the needs-recovery file exists and if it
does, tries a blocking exclusive lock on that file. If it is granted
that lock, it checks if the file still exists and if it does, performs
recovery. If the file doesn't exist, someone else has recovered the
file, so release the lock and proceed.

Then, the bogofilter process creates the dbuser.$PID.new file, locks it
exclusively and renames it to dbuser.$PID, opens the environment,
performs its work, closes the environment and removes the dbuser.$PID
file and unlocks it, in that order.

The watcher process sleeps for a minute and then scans as described
above, if a process has exited uncleanly, the environment is marked as
needing recovery and the processes are aborted.

This protocol allows the watcher to check if a PID has been reused by a
different process (in that case, there is no lock on dbuser.$PID), which
allows us to a. detect that our own process is gone, b. prevent killing
other processes that aren't related to us.

Please send comments to this protocol. If you see how it can be
simplified, let me know.

Cheers,

-- 
Matthias Andree

NOTE YOU WILL NOT RECEIVE MY MAIL IF YOU'RE USING SPF!
Encrypted mail welcome: my GnuPG key ID is 0x052E7D95 (PGP/MIME preferred)