[long] Recovery handling in TXN branch

Pavel Kankovsky peak at argo.troja.mff.cuni.cz
Fri Aug 20 15:45:04 CEST 2004


On Fri, 20 Aug 2004, Matthias Andree wrote:

> > Would it be acceptable to set a reasonably long alarm (e.g. 30 seconds)
> > before any db operation (of group of them) and conclude the db is
> > deadlocked and needs recovery when the alarm expires?
> 
> I'd wondered about that. How to we figure an adequate timeout? How do we
> know what timeouts external software, for instance maildrop or Postfix,
> impose?

It should be long enough not to generate false alarms and short enough to
detect deadlocks in "real time". Something like 30-60 s looks like a
reasonable value.

IMHO, timeouts imposed by external software do not really matter unless
the external software in question kills Bogofilter (and prevents it from
initiating the recovery) when its timeout expires.

> > - no signals (can work among processes running under different
> >   uids, no pid recycling races)
> 
> We'd still need internal signals such as SIGALRM to run the watcher
> every 30 seconds or so.

Yes, right. I wanted to say "no interprocess signals".

BTW: I think the periodic checks can be much more frequent. Perhaps every
5 seconds. Or perhaps even more frequent with some bound on the number of
work done in every step (e.g. it checks 20 cells, works for a seconds,
checks 20 more cells, works for a second...). Moreover, every process can
start at a random offset (to avoid the situation when all running
processes scan the same part of APRT), and an aborting process might fill
APRT will 1's to make other processes detect the failure faster.

> No DB calls are permitted from signal handlers, BerkeleyDB is not
> re-entrant. We can only crash when we find that another process has
> exited uncleanly.

Isn't DB_PANIC_ENVIRONMENT almost useless if it cannot be set from a
signal handler?

> Also, setting DB_PANIC_ENVIRONMENT does not abort existing processes.

Hm. Even less useful.

> > The watcher might see a locked pid file but the process might exit
> > (unlocking the file) and its pid might be recycled before the watcher
> > kills it. It is rather unlikely but it can happen.
> 
> I don't see how that would be a problem, if we scan for "need recovery"
> before locking our own dbuser.$PID file.

1. The watcher (W), and two processes (P1 and P2) are running.
   dbuser.P1 and dbuser.P2 exist and are locked.
2. P1 crashes.
3. W sees an unlocked dbuser.P1 and a locked dbuser.P2 and prepares
   to kill P2.
4. P2 crashes.
5. Pid P2 is recycled (by a completely unrelated process knowing nothing
   about Bogofilter database and dbuser.$PID files).
6. W sends a signal to P2. Oops.

> > I think there is a race condition here. needs-recovery might be created
> > after this check (and before dbuser.$PID is created) and there might be
> > one process performing the recovery while another process attempts to use
> > the database in an usual way.
> 
> Maybe.

1. The watcher (W) and two process (P1, P2) are running.
2. Another process (P3) starts, looks for needs-recovery but does not
   find it.
3. P1 crashes, W sees it, kills P2, creates needs-recovery.
4. Yet another process (P4) starts, sees needs-recovery,
   and prepares to start db recovery.
5. P3 creates dbuser.P3 and opens the database.
6. P4 starts db recovery and collides with P3.

> I wonder about how your single-file APRT is handled. Will it be grown
> on demand, i. e. if I don't find a cell, reopen with O_APPEND and
> write one?

There are two possible approaches:
1. Preallocate some fixed number of cells and limit the number
   of processes accessing the db by this number.
2. Use the space beyond the file's end as an ulimited supply of
   zero cells. (O_APPEND is not necessary, afaik it is possible to
   lock areas beyond the current EOF.)

IMHO the choice is a matter of personal taste: #2 is more flexible but
#1 is simpler, yet it can handle most cases as well (who needs more than
1000 of concurrent processes? and even if one really needs to raise the
limit, a process holding an exclusive lock on LCKF can do it).


--Pavel Kankovsky aka Peak  [ Boycott Microsoft--http://www.vcnet.com/bms ]
"Resistance is futile. Open your source code and prepare for assimilation."




More information about the bogofilter-dev mailing list