Ideas wanted for TXN and Concurrent store recovery handling

Tom Anderson tanderso at oac-design.com
Mon Jul 26 22:51:15 CEST 2004


From: "Matthias Andree" <matthias.andree at gmx.de>
>
> I'm looking for ideas for handling recovery in the
> transactional/concurrent branches of bogofilter.
>
> These branches use the native locking mechanism of the Berkeley DB for
> efficiency. It may happen in a crash or with forced abortion of a
> bogofilter process that data base locks aren't cleared, and all
> subsequent attempts to run bogofilter will then wait for the release of
> a lock that will never happen. The remedy is to stop all
> bogofilter/bogoutil processes, prevent new ones from being started by
> stopping the mail system, then run db_recover -h .bogofilter, and
> restart the mail system.
>
> My goal is to have bogofilter detect such timeouts itself and set a
> marker so it can run recovery, in order to not get stuck for a long
> time. The traditional code used fcntl-style locks that clear
> automatically when the application holding them quits, either orderly or
> through crash or force.
>
> I currently see two approaches to attack the problem:
>
> 1. We can use a timer. This raises a new problem: how do I figure if the
>    operation is just slow or is stuck? If anyone knows a way to figure,
>    please speak up.
>
> 2. We can use a sophisticated lock protocol for any bogofilter process
>    that is about to open the data base, to make sure that only one
>    process can run the recovery process, and run the recovery process as
>    part of bogofilter's startup sequence whenever an exclusive lock can
>    be acquired; a process that was about to do recovery would try an
>    exclusive lock, a process that will just read from or modify the data
>    base can use a shared lock (to avoid running recovery in the middle
>    of another process modifying the data base).
>
> Ideas are welcome. Please direct your replies to the
> bogofilter-dev at bogofilter.org mailing list.

I'm not familiar with Berkeley DB or how it locks.  You said you're
currently using native DB locking, but are you looking for an alternative or
a better way to use Berkeley DB?  You could record the process id in a
seperate lockfile.  Use a generic timeout period, and then check if that
process is still running after that time.  If so, continue to wait.  If not,
change the process id to the current process (retaining the lock) and start
your recovery process.  Don't know if that can be done atomically.  Try
sys/sem.h.  What if it's hung, but still running?  Perhaps you can ping the
running process by raising a signal.  Don't know if any of that helps or if
its just a mental fart.

Tom



> -- 
> Matthias Andree
>
> Encrypted mail welcome: my GnuPG key ID is 0x052E7D95 (PGP/MIME preferred)
> _______________________________________________
> Bogofilter mailing list
> Bogofilter at bogofilter.org
> http://www.bogofilter.org/mailman/listinfo/bogofilter
>




More information about the bogofilter-dev mailing list