DB backend support for lmdb?
Steffen Nurpmeso
steffen at sdaoden.eu
Mon May 28 23:57:23 CEST 2018
Hello.
Matthias Andree <matthias.andree at gmx.de> wrote:
|Am 26.05.2018 um 23:43 schrieb Steffen Nurpmeso:
|>|If you have figures to prove that SQlite is slower by factors (beyond
|>|3), I'd like to see them; other than that we've had R. David Hipp look
|>|over our implementation and it's optimized quite a bit already -
|>|although using an SQL database for a key-value store is an oversized
|>|approach, so some price has to be paid.
|>
|> Really? Well, it really is, and a lot; even after manually
|> applying VACUUM it is twice as large, but worse is that actual
|> spam checking is factor two _and_more_ slower than DB, even if
|> bogofilter is compiled -O1 -g. Maybe or even surely that also has
|> to do with fsync(2) or whatever, i do not know, but that does not
|> change a thing. I am doing `spamrate' (which actually is
|> "bogofilter -TTu 2>/dev/null") and i would not expect that to come
|> into play. Yet sqlite DB file times change(d) even after such
|> a command. They do not for DB. I do not know.
|
|I know that SQLite3 used to be substantially slower than Berkeley DB
|when I last benchmarked both many a year ago, and a factor of 2 - 3
|seems plausible but has been irrelevant for all my personal mail volumes.
For me it runs on the client, and then several hundred mails come
in in a rush at least once a day, and this means you have to wait;
minutes in practice here, at least then. Will surely get better
with the new box in summer. Or i could of course move the process
to the server and splice the stuff into Postfix delivery. It is
just, you know, very small here, and i have my daily backup
script, on the stick, and in my pocket.
|Note that -u incurs writes so is prone to whatever consistency and
|durability models the database uses - and it will hurt with LMDB too due
|to its "one writer only", so you serialize those processes if you use
|"-u" just as much as you would with -n or -s (unless you have lots of
|unsures, when -u is useless).
Hmm, likely the test was not thorough enough. Sorry. Yes, of
course, but likely that because the same messages had been passed
nothing new happened (but counter increments?); the sqlite version
did change the size, and the DB one not, that was definetely true.
|>|The database implementation needs some support logic in
|>|bogofilter/configure.ac and bogofilter/src/Makefile.am
|>
|> That surely is the very very hard part. :)
|
|Not if you know autotools (autoconf/automake) a bit. O:-)
It seems to be known that i really dislike this! Yes, that is
true; of course this stuff is powerful, and depending on the
project it may be just ok. For me i dislike it and like projects
which only test what they really need and possible work fine with
a simple "make". Take bogofilter, for example. Surely the m4/ is
pretty small, but the configuration performs many tests which
could be combined (for example, the integer typedefs), and then
tests aclocal, automake and autoconf (why?), ..and then.. runs
config.status --recheck and all the stuff is tested once again!
Then compilation starts. Yay.
Noone is perfect: for me all this surely is a psychological issue
too; I spent so much time creating halfway nice makefiles etc.
For example i still have an xfig makefile, even though i have not
used the program in way over a decade, likely more. More.
|> Interestingly i only have a wordlist.db on AlpineLinux, whereas
|> before there were logs and multiple other things (which in fact
|> i have forgotten, and am too lazy to boot the old box)!?!
|
|It depends. SQLite uses the same extension, .db, and Berkeley DB has two
|modes, one is the plain old (which is not ACID compliant) and has only
|the one [wordlist].db file - I advise against using that unless you can
|recreate the database anytime from saved spam/ham corpora.
Ah! This i did not know, i have always worked with packages until
just recently, after finding the space and performance issue
i compiled on my own the first time. Sorry, sorry; i have read
the bogofilter manual once when i had thrown away the homebrew
junk mail code from the MUA i maintain, in order to create
a working environment for the new mail code -- in summer 2013.
|The other is the transactional mode (advertised as this Berkeley DB
|Transactional Data Store) that writes additional files, for instance,
|trivially:
|
|__db.001
|__db.002
|__db.003
|lockfile-d
|lockfile-p
|log.0000000001
|wordlist.db
Yes, that is what i knew, and this is what i get after
recompilation with --enable-transactions. Thanks for the
information!
|where __db.* are the environment (can be scrapped when no process is
|running), lockfile-? (needed because DB < 4.4 cannot auto-recover
|transactional databases, and we need to track if a writer crashed
|previously),
|log.* which are the actual transaction logs (necessary to recover from
|the latest valid checkpoint, so these are precious) and
|wordlist.db that you know (and that is precious unless you save all logs).
It seems to me there is quite a lot of context that i did not know
about. So i think implementing LMDB support will not be a quick
shot if done right, i need to read a lot of documentation from
LMDB and source code from Bogofilter. So it may take a little bit
longer until i can provide a patch -- nonetheless, i am definetely
very interested in LMDB support for bogofilter, if doable, because
it is very small (the raw AlpineLinux code package is 90KB,
whereas DB is 1.6MB; the cloned repo is 1.2MB, whereas the 5.3.28
DB tar ball unpacked in git is 31MB), and the code is also open
and openly maintained. And Postfix supports LMDB as a replacement
for DB out of the box, too. All this is very desirable to me.
Ciao!
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)
More information about the bogofilter-dev
mailing list