qdbm tuning (was: Experience with 0.15.2)

Stefan Bellon sbellon at sbellon.de
Mon Sep 8 20:56:26 CEST 2003


Matthias Andree wrote:
> Stefan Bellon <sbellon at sbellon.de> writes:

[snip]

> > 2) Matthias re-worked my QDBM code and replaced the Relic API with
> >    the Depot API. First of all, this makes a few includes that are
> >    still present in the code superfluous, but second and worse,
> >    it's a lot slower as the Relic API does some optimization for
> >    you that you have to do by your own if you use the Depot API. I
> >    have added those things now. See attached patch for
> >    src/datastore_qdbm.c.

> Why is the alignsize set to 16? We store 12-byte records at the
> moment, and 16 are "dangerously" aligned to presumed CPU cache lines
> so we'd better avoid it. 12 nicely skews the array mapping to CPU
> cache lines on anything that has cache lines longer than 32 bit and
> should improve cache efficiency without ever causing adverse effects.

Ok, will test that.

> I don't think we should stuff that re-organization code into
> db_set_setvalue, or is the dpbnum so cheap we can afford it?

Yes, it's in O(1).

> I'd think we'd better use db_close to reorganize. Objections?

Yes. I register 100 MB of spam/ham in one go. This totally breaks down
performance if I don't organize every know and then. I'm just not sure
whether a used/buckets ratio of 1.25 is a good threshold value. I know
of papers that say you have to reorganize as soon as the ratio gets to
0.80.

You can calculate the problem for yourself. If, at the beginning are
only 1913 bucktes available and I want to feed over 300000 words into
the word lists, then this is no good.

-- 
Stefan Bellon




More information about the bogofilter-dev mailing list