DB backend support for lmdb?

Steffen Nurpmeso steffen at sdaoden.eu
Mon May 28 23:57:23 CEST 2018


Hello.

Matthias Andree <matthias.andree at gmx.de> wrote:
 |Am 26.05.2018 um 23:43 schrieb Steffen Nurpmeso:
 |>|If you have figures to prove that SQlite is slower by factors (beyond
 |>|3), I'd like to see them; other than that we've had R. David Hipp look
 |>|over our implementation and it's optimized quite a bit already -
 |>|although using an SQL database for a key-value store is an oversized
 |>|approach, so some price has to be paid.
 |> 
 |> Really?  Well, it really is, and a lot; even after manually
 |> applying VACUUM it is twice as large, but worse is that actual
 |> spam checking is factor two _and_more_ slower than DB, even if
 |> bogofilter is compiled -O1 -g.  Maybe or even surely that also has
 |> to do with fsync(2) or whatever, i do not know, but that does not
 |> change a thing.  I am doing `spamrate' (which actually is
 |> "bogofilter -TTu 2>/dev/null") and i would not expect that to come
 |> into play.  Yet sqlite DB file times change(d) even after such
 |> a command.  They do not for DB.  I do not know.
 |
 |I know that SQLite3 used to be substantially slower than Berkeley DB
 |when I last benchmarked both many a year ago, and a factor of 2 - 3
 |seems plausible but has been irrelevant for all my personal mail volumes.

For me it runs on the client, and then several hundred mails come
in in a rush at least once a day, and this means you have to wait;
minutes in practice here, at least then.  Will surely get better
with the new box in summer.  Or i could of course move the process
to the server and splice the stuff into Postfix delivery.  It is
just, you know, very small here, and i have my daily backup
script, on the stick, and in my pocket.

 |Note that -u incurs writes so is prone to whatever consistency and
 |durability models the database uses - and it will hurt with LMDB too due
 |to its "one writer only", so you serialize those processes if you use
 |"-u" just as much as you would with -n or -s (unless you have lots of
 |unsures, when -u is useless).

Hmm, likely the test was not thorough enough.  Sorry.  Yes, of
course, but likely that because the same messages had been passed
nothing new happened (but counter increments?); the sqlite version
did change the size, and the DB one not, that was definetely true.

 |>|The database implementation needs some support logic in
 |>|bogofilter/configure.ac and bogofilter/src/Makefile.am
 |> 
 |> That surely is the very very hard part.  :)
 |
 |Not if you know autotools (autoconf/automake) a bit. O:-)

It seems to be known that i really dislike this!  Yes, that is
true; of course this stuff is powerful, and depending on the
project it may be just ok.  For me i dislike it and like projects
which only test what they really need and possible work fine with
a simple "make".  Take bogofilter, for example.  Surely the m4/ is
pretty small, but the configuration performs many tests which
could be combined (for example, the integer typedefs), and then
tests aclocal, automake and autoconf (why?), ..and then.. runs
config.status --recheck and all the stuff is tested once again!
Then compilation starts.  Yay.
Noone is perfect: for me all this surely is a psychological issue
too; I spent so much time creating halfway nice makefiles etc.
For example i still have an xfig makefile, even though i have not
used the program in way over a decade, likely more.  More.

 |> Interestingly i only have a wordlist.db on AlpineLinux, whereas
 |> before there were logs and multiple other things (which in fact
 |> i have forgotten, and am too lazy to boot the old box)!?!
 |
 |It depends. SQLite uses the same extension, .db, and Berkeley DB has two
 |modes, one is the plain old (which is not ACID compliant) and has only
 |the one [wordlist].db file - I advise against using that unless you can
 |recreate the database anytime from saved spam/ham corpora.

Ah!  This i did not know, i have always worked with packages until
just recently, after finding the space and performance issue
i compiled on my own the first time.  Sorry, sorry; i have read
the bogofilter manual once when i had thrown away the homebrew
junk mail code from the MUA i maintain, in order to create
a working environment for the new mail code -- in summer 2013.

 |The other is the transactional mode (advertised as this Berkeley DB
 |Transactional Data Store) that writes additional files, for instance,
 |trivially:
 |
 |__db.001
 |__db.002
 |__db.003
 |lockfile-d
 |lockfile-p
 |log.0000000001
 |wordlist.db

Yes, that is what i knew, and this is what i get after
recompilation with --enable-transactions.  Thanks for the
information!

 |where __db.* are the environment (can be scrapped when no process is
 |running), lockfile-? (needed because DB < 4.4 cannot auto-recover
 |transactional databases, and we need to track if a writer crashed
 |previously),
 |log.* which are the actual transaction logs (necessary to recover from
 |the latest valid checkpoint, so these are precious) and
 |wordlist.db that you know (and that is precious unless you save all logs).

It seems to me there is quite a lot of context that i did not know
about.  So i think implementing LMDB support will not be a quick
shot if done right, i need to read a lot of documentation from
LMDB and source code from Bogofilter.  So it may take a little bit
longer until i can provide a patch -- nonetheless, i am definetely
very interested in LMDB support for bogofilter, if doable, because
it is very small (the raw AlpineLinux code package is 90KB,
whereas DB is 1.6MB; the cloned repo is 1.2MB, whereas the 5.3.28
DB tar ball unpacked in git is 31MB), and the code is also open
and openly maintained.  And Postfix supports LMDB as a replacement
for DB out of the box, too.  All this is very desirable to me.

Ciao!

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



More information about the bogofilter-dev mailing list