[long] Recovery handling in TXN branch

Fri Aug 20 16:23:29 CEST 2004

Pavel Kankovsky <peak at argo.troja.mff.cuni.cz> writes:

> On Fri, 20 Aug 2004, Matthias Andree wrote:
>
>> > Would it be acceptable to set a reasonably long alarm (e.g. 30 seconds)
>> > before any db operation (of group of them) and conclude the db is
>> > deadlocked and needs recovery when the alarm expires?
>> 
>> I'd wondered about that. How to we figure an adequate timeout? How do we
>> know what timeouts external software, for instance maildrop or Postfix,
>> impose?
>
> It should be long enough not to generate false alarms and short enough to
> detect deadlocks in "real time". Something like 30-60 s looks like a
> reasonable value.

I wouldn't use it to detect dead locks, but only to run a periodic check
if one of the processes has exited uncleanly. "Unclean exit" would be
the trigger for the checking process to abort.

> IMHO, timeouts imposed by external software do not really matter unless
> the external software in question kills Bogofilter (and prevents it from
> initiating the recovery) when its timeout expires.

That may happen.

> checks 20 more cells, works for a second...). Moreover, every process can
> start at a random offset (to avoid the situation when all running
> processes scan the same part of APRT), and an aborting process might fill
> APRT will 1's to make other processes detect the failure faster.

I wonder if that's necessary. A thorough check every 30 s seems fine to
me. Bogofilter will usually run unattended so no-one will notice.

>> No DB calls are permitted from signal handlers, BerkeleyDB is not
>> re-entrant. We can only crash when we find that another process has
>> exited uncleanly.
>
> Isn't DB_PANIC_ENVIRONMENT almost useless if it cannot be set from a
> signal handler?

OK, the phrase in question
(BerkeleyDB.4.2/docs/ref/program/appsignals.html) is:

  “Because Berkeley DB is not re-entrant, the signal handler should not
  attempt to release locks and/or close the database handles
  itself. Re-entering Berkeley DB is not guaranteed to work correctly,
  and the results are undefined."

I've read that as though the whole system wasn't re-entrant.

>> Also, setting DB_PANIC_ENVIRONMENT does not abort existing processes.
>
> Hm. Even less useful.

True. :-/

>> I don't see how that would be a problem, if we scan for "need recovery"
>> before locking our own dbuser.$PID file.
>
> 1. The watcher (W), and two processes (P1 and P2) are running.
>    dbuser.P1 and dbuser.P2 exist and are locked.
> 2. P1 crashes.
> 3. W sees an unlocked dbuser.P1 and a locked dbuser.P2 and prepares
>    to kill P2.
> 4. P2 crashes.
> 5. Pid P2 is recycled (by a completely unrelated process knowing nothing
>    about Bogofilter database and dbuser.$PID files).
> 6. W sends a signal to P2. Oops.

Now I see it.

>> I wonder about how your single-file APRT is handled. Will it be grown
>> on demand, i. e. if I don't find a cell, reopen with O_APPEND and
>> write one?
>
> There are two possible approaches:
> 1. Preallocate some fixed number of cells and limit the number
>    of processes accessing the db by this number.
> 2. Use the space beyond the file's end as an ulimited supply of
>    zero cells. (O_APPEND is not necessary, afaik it is possible to
>    lock areas beyond the current EOF.)

> IMHO the choice is a matter of personal taste: #2 is more flexible but
> #1 is simpler, yet it can handle most cases as well (who needs more than
> 1000 of concurrent processes? and even if one really needs to raise the
> limit, a process holding an exclusive lock on LCKF can do it).

Hm. I wonder if we should just let processes pile up that cannot get a
cell or if they should exit right away. It is indeed a rather academic
issue. Do we want the Perfect solution or do we want a solution that
works in most but not all extreme cases. The process count limit
(process table size) will act as an external hard limit anyways. So we
could use its figure as a maximum number.

I wonder how scalable concurrent access in BerkeleyDB is. BerkeleyDB
appears to do exponential backoff when it cannot get a .db-internal lock
and then clamps the maximum delay to a version-dependent upper bound. I
don't care right now but I'd have to either find a paper or do some
research here.

-- 
Matthias Andree

NOTE YOU WILL NOT RECEIVE MY MAIL IF YOU'RE USING SPF!
Encrypted mail welcome: my GnuPG key ID is 0x052E7D95 (PGP/MIME preferred)