DB backend support for lmdb?
Steffen Nurpmeso
steffen at sdaoden.eu
Tue May 29 20:51:11 CEST 2018
Ahoi.
Matthias Andree wrote:
|Am 28.05.2018 um 23:57 schrieb Steffen Nurpmeso:
|>|Note that -u incurs writes so is prone to whatever consistency and
|>|durability models the database uses - and it will hurt with LMDB too due
|>|to its "one writer only", so you serialize those processes if you use
|>|"-u" just as much as you would with -n or -s (unless you have lots of
|>|unsures, when -u is useless).
|>
|> Hmm, likely the test was not thorough enough. Sorry. Yes, of
|> course, but likely that because the same messages had been passed
|> nothing new happened (but counter increments?); the sqlite version
|> did change the size, and the DB one not, that was definetely true.
|
|It depends on whether the write causes page overflows and new pages to
|be created, or not. Certainly using SQLite is overkill because we still
|used it as a key/value store... but it was a user request at the time
|and I have yet to see reports about other trouble than it not being the
|fastest or writing the smallest databases.
Please, i have absolutely nothing against sqlite, i used it
myself. It may even be interesting to have the possibility to
look into the tables and access them with SQL statements, for
those who have interest in that.
|>|>|The database implementation needs some support logic in
|>|>|bogofilter/configure.ac and bogofilter/src/Makefile.am
|>|>
|>|> That surely is the very very hard part. :)
|>|
|>|Not if you know autotools (autoconf/automake) a bit. O:-)
|>
|> It seems to be known that i really dislike this! Yes, that is
|> true; of course this stuff is powerful, and depending on the
|> project it may be just ok. For me i dislike it and like projects
|> which only test what they really need and possible work fine with
|> a simple "make". Take bogofilter, for example. Surely the m4/ is
|> pretty small, but the configuration performs many tests which
|> could be combined (for example, the integer typedefs), and then
|> tests aclocal, automake and autoconf (why?), ..and then.. runs
|> config.status --recheck and all the stuff is tested once again!
|> Then compilation starts. Yay.
|
|A --recheck should not happen on the tarballs, except if your clock
|(system or file system timestamps) is very coarse or non-monotonic.
|After "svn update" it will usually happen.
Maybe because i have it in git, and all those repos are reduced to
a "null" branch to save backup space. If i need something i check
out the "master" and compile that. And git does not restore file
times when the checkout happens. Ok, so maybe also my fault.
|If it only hurts developers, that's a non-issue, and I paid attention to
|write the configure cache out in strategic places. You know that when I
|started developing bogofilter c. 15 years ago, computers and disk drives
|were a lot slower.
|
|./configure -C # ... advised. :-)
Yes i know the former. The latter not, i will try it; especially
today many projects with submodules require multiple configuration
runs, i hope that helps there. I have added that to all
configure runs in my (extern.)code.arena makefile. Thanks!
|The autotools stuff is in place, works reliably, has a rich feature set,
|and with recent automake implementations, "make check" tests run in
|parallel. Do that with SSD on a modern octocore computer and see it fly
|in spite of a recursive Makefile structure, or perhaps with /tmp a
|RAMDISK and then make check BF_TESTDIR=/tmp -- I wouldn't want to use an
|early Raspberry Pi as development platform though. Deployment is
|another matter.
hmmhmm, yes, well.. ;)
|>|It depends. SQLite uses the same extension, .db, and Berkeley DB has two
|>|modes, one is the plain old (which is not ACID compliant) and has only
|>|the one [wordlist].db file - I advise against using that unless you can
|>|recreate the database anytime from saved spam/ham corpora.
|>
|> Ah! This i did not know, i have always worked with packages until
|> just recently, after finding the space and performance issue
|> i compiled on my own the first time. Sorry, sorry; i have read
|> the bogofilter manual once when i had thrown away the homebrew
|> junk mail code from the MUA i maintain, in order to create
|> a working environment for the new mail code -- in summer 2013.
|
|bogofilter -V tells you the database it is using, and there's doc/README.db.
|
|>|The other is the transactional mode (advertised as this Berkeley DB
|>|Transactional Data Store) that writes additional files, for instance,
|>|trivially:
|>|
|>|__db.001
|>|__db.002
|>|__db.003
|>|lockfile-d
|>|lockfile-p
|>|log.0000000001
|>|wordlist.db
|>
|> Yes, that is what i knew, and this is what i get after
|> recompilation with --enable-transactions. Thanks for the
|> information!
|
|And perhaps after running any registering stuff with
|--db-transaction=yes once. And perhaps undoing the same registration.
It is a powerful tool, and i am using only 5 percent of it.
(That command line option is not to be seen in a manual page.)
|> It seems to me there is quite a lot of context that i did not know
|> about. So i think implementing LMDB support will not be a quick
|> shot if done right, i need to read a lot of documentation from
|> LMDB and source code from Bogofilter. So it may take a little bit
|
|LMDB should be less of a hassle - and on second thought, it might be
|easier to start the implementation off after reading the Berkeley DB
|*and* Kyotocabinet (or Tokyocabinet) implementations to see the simpler
|ones.
Will do, soon.
|> longer until i can provide a patch -- nonetheless, i am definetely
|> very interested in LMDB support for bogofilter, if doable, because
|> it is very small (the raw AlpineLinux code package is 90KB,
|> whereas DB is 1.6MB; the cloned repo is 1.2MB, whereas the 5.3.28
|> DB tar ball unpacked in git is 31MB), and the code is also open
|> and openly maintained. And Postfix supports LMDB as a replacement
|> for DB out of the box, too. All this is very desirable to me.
|
|Repo size of a support library isn't normally a relevant metric, but
|this is a valid point, as is its license:
|
| text data bss dec hex filename
| 80510 1504 8 82022 14066 /usr/lib64/liblmdb.so
Runtime is much smaller here, too:
#?0[steffen at essex nail.git]$ size /usr/lib/liblmdb.so
text data bss dec hex filename
69680 1344 80 71104 115c0 /usr/lib/liblmdb.so
#?0[steffen at essex nail.git]$ size /usr/lib/libdb.so
text data bss dec hex filename
1549515 38744 64 1588323 183c63 /usr/lib/libdb.so
I am looking forward for this.
Ciao, and thanks for the informations!
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)
More information about the bogofilter-dev
mailing list