DB corruption within minutes

Gyepi SAM gyepi at praxis-sw.com
Sat Jan 11 20:14:10 CET 2003


On Sat, Jan 11, 2003 at 01:59:05PM +0100, Matthias Andree wrote:
> On Sat, 11 Jan 2003, Gyepi SAM wrote:
> > > I have added t.lock2 and made minor fixes to bogofilter.
> > 
> > I noticed. I also change the grind loop to run for 1000 iterations,
> > and found no problems.
> 
> Even 8 did the job on my system SuSE Linux 8.1, Duron/700? UWSCSI hard
> drive, DB-4.0.14, Kernel 2.4.19, ext3fs.
> 
> What's the difference to your system? Slower machine? Slower hard drive?
> Faster machine? different OS? different DB version?

Redhat 7.1, Athlon 1.3G IDE drives, kernel 2.4.19,ext2

The used to be slower (PII -400) and I have tested with both
db 3.17 and 4.0.14 with the same result.

> > The only solutions I can think of are
> > 1. call open (2) on the database ourselves, so we have a handle to lock
> > 2. use an external lockfile.
> 
> An external lockfile is the global lock we don't want to use for
> scalability reasons. I've also though about integrating the locking with
> db_open.

It would not be a global lock file; there would be one per database and it will be locked the same way we lock the databases now, except that we could
unlock it after the database is closed. I still think solution 1 is better though. Fewer changes too.
 
> > > Plus, I believe we cannot release the lock, have someone else update the
> > > db and then grab the lock again to proceed. The pages may have changed,
> > > so we have inconsistent cache/disk data.
> > 
> > If we call db_sync() after updating the database but before releasing
> > the lock, that should fix any syncronization problems of that sort.
> 
> That's only one half. The other half would be to make the other data
> bases that have waited for the lock flush /their/ caches, but I don't
> currently see how that would be done other than with DB->close and
> DB->open.

I don't think that's necessary, since the other databses that have waited
for the lock would not have modified their databases before acquiring the
lock so there should be nothing to flush.

> It's sort of euhm unhelpful if you cannot reproduce the problem, because
> it seems you have most experience with BDB of all active bogofilter
> hackers. I seem to have grasped the basics though.

Yes, it does seem strange that I cannot reproduce the problem, but I'll keep
working on it.

-Gyepi




More information about the bogofilter-dev mailing list