DB backend support for lmdb?

Matthias Andree matthias.andree at gmx.de
Mon May 28 20:54:24 CEST 2018


Am 26.05.2018 um 23:43 schrieb Steffen Nurpmeso:

>  |If you have figures to prove that SQlite is slower by factors (beyond
>  |3), I'd like to see them; other than that we've had R. David Hipp look
>  |over our implementation and it's optimized quite a bit already -
>  |although using an SQL database for a key-value store is an oversized
>  |approach, so some price has to be paid.
> 
> Really?  Well, it really is, and a lot; even after manually
> applying VACUUM it is twice as large, but worse is that actual
> spam checking is factor two _and_more_ slower than DB, even if
> bogofilter is compiled -O1 -g.  Maybe or even surely that also has
> to do with fsync(2) or whatever, i do not know, but that does not
> change a thing.  I am doing `spamrate' (which actually is
> "bogofilter -TTu 2>/dev/null") and i would not expect that to come
> into play.  Yet sqlite DB file times change(d) even after such
> a command.  They do not for DB.  I do not know.

I know that SQLite3 used to be substantially slower than Berkeley DB
when I last benchmarked both many a year ago, and a factor of 2 - 3
seems plausible but has been irrelevant for all my personal mail volumes.

Note that -u incurs writes so is prone to whatever consistency and
durability models the database uses - and it will hurt with LMDB too due
to its "one writer only", so you serialize those processes if you use
"-u" just as much as you would with -n or -s (unless you have lots of
unsures, when -u is useless).

>  |The database implementation needs some support logic in
>  |bogofilter/configure.ac and bogofilter/src/Makefile.am
> 
> That surely is the very very hard part.  :)

Not if you know autotools (autoconf/automake) a bit. O:-)

> Interestingly i only have a wordlist.db on AlpineLinux, whereas
> before there were logs and multiple other things (which in fact
> i have forgotten, and am too lazy to boot the old box)!?!

It depends. SQLite uses the same extension, .db, and Berkeley DB has two
modes, one is the plain old (which is not ACID compliant) and has only
the one [wordlist].db file - I advise against using that unless you can
recreate the database anytime from saved spam/ham corpora.

The other is the transactional mode (advertised as this Berkeley DB
Transactional Data Store) that writes additional files, for instance,
trivially:

__db.001
__db.002
__db.003
lockfile-d
lockfile-p
log.0000000001
wordlist.db

where __db.* are the environment (can be scrapped when no process is
running), lockfile-? (needed because DB < 4.4 cannot auto-recover
transactional databases, and we need to track if a writer crashed
previously),
log.* which are the actual transaction logs (necessary to recover from
the latest valid checkpoint, so these are precious) and
wordlist.db that you know (and that is precious unless you save all logs).



More information about the bogofilter-dev mailing list