DB backend support for lmdb?

Matthias Andree matthias.andree at gmx.de
Tue May 29 08:21:07 CEST 2018


Moin,

Am 28.05.2018 um 23:57 schrieb Steffen Nurpmeso:

>  |Note that -u incurs writes so is prone to whatever consistency and
>  |durability models the database uses - and it will hurt with LMDB too due
>  |to its "one writer only", so you serialize those processes if you use
>  |"-u" just as much as you would with -n or -s (unless you have lots of
>  |unsures, when -u is useless).
> 
> Hmm, likely the test was not thorough enough.  Sorry.  Yes, of
> course, but likely that because the same messages had been passed
> nothing new happened (but counter increments?); the sqlite version
> did change the size, and the DB one not, that was definetely true.

It depends on whether the write causes page overflows and new pages to
be created, or not.  Certainly using SQLite is overkill because we still
used it as a key/value store... but it was a user request at the time
and I have yet to see reports about other trouble than it not being the
fastest or writing the smallest databases.

>  |>|The database implementation needs some support logic in
>  |>|bogofilter/configure.ac and bogofilter/src/Makefile.am
>  |> 
>  |> That surely is the very very hard part.  :)
>  |
>  |Not if you know autotools (autoconf/automake) a bit. O:-)
> 
> It seems to be known that i really dislike this!  Yes, that is
> true; of course this stuff is powerful, and depending on the
> project it may be just ok.  For me i dislike it and like projects
> which only test what they really need and possible work fine with
> a simple "make".  Take bogofilter, for example.  Surely the m4/ is
> pretty small, but the configuration performs many tests which
> could be combined (for example, the integer typedefs), and then
> tests aclocal, automake and autoconf (why?), ..and then.. runs
> config.status --recheck and all the stuff is tested once again!
> Then compilation starts.  Yay.

A --recheck should not happen on the tarballs, except if your clock
(system or file system timestamps) is very coarse or non-monotonic.
After "svn update" it will usually happen.

If it only hurts developers, that's a non-issue, and I paid attention to
write the configure cache out in strategic places.  You know that when I
started developing bogofilter c. 15 years ago, computers and disk drives
were a lot slower.

./configure -C # ...  advised. :-)

The autotools stuff is in place, works reliably, has a rich feature set,
and with recent automake implementations, "make check" tests run in
parallel.  Do that with SSD on a modern octocore computer and see it fly
in spite of a recursive Makefile structure, or perhaps with /tmp a
RAMDISK and then make check BF_TESTDIR=/tmp -- I wouldn't want to use an
early Raspberry Pi as development platform though.  Deployment is
another matter.

>  |It depends. SQLite uses the same extension, .db, and Berkeley DB has two
>  |modes, one is the plain old (which is not ACID compliant) and has only
>  |the one [wordlist].db file - I advise against using that unless you can
>  |recreate the database anytime from saved spam/ham corpora.
> 
> Ah!  This i did not know, i have always worked with packages until
> just recently, after finding the space and performance issue
> i compiled on my own the first time.  Sorry, sorry; i have read
> the bogofilter manual once when i had thrown away the homebrew
> junk mail code from the MUA i maintain, in order to create
> a working environment for the new mail code -- in summer 2013.

bogofilter -V tells you the database it is using, and there's doc/README.db.

> 
>  |The other is the transactional mode (advertised as this Berkeley DB
>  |Transactional Data Store) that writes additional files, for instance,
>  |trivially:
>  |
>  |__db.001
>  |__db.002
>  |__db.003
>  |lockfile-d
>  |lockfile-p
>  |log.0000000001
>  |wordlist.db
> 
> Yes, that is what i knew, and this is what i get after
> recompilation with --enable-transactions.  Thanks for the
> information!

And perhaps after running any registering stuff with
--db-transaction=yes once. And perhaps undoing the same registration.

> It seems to me there is quite a lot of context that i did not know
> about.  So i think implementing LMDB support will not be a quick
> shot if done right, i need to read a lot of documentation from
> LMDB and source code from Bogofilter.  So it may take a little bit

LMDB should be less of a hassle - and on second thought, it might be
easier to start the implementation off after reading the Berkeley DB
*and* Kyotocabinet (or Tokyocabinet) implementations to see the simpler
ones.

> longer until i can provide a patch -- nonetheless, i am definetely
> very interested in LMDB support for bogofilter, if doable, because
> it is very small (the raw AlpineLinux code package is 90KB,
> whereas DB is 1.6MB; the cloned repo is 1.2MB, whereas the 5.3.28
> DB tar ball unpacked in git is 31MB), and the code is also open
> and openly maintained.  And Postfix supports LMDB as a replacement
> for DB out of the box, too.  All this is very desirable to me.

Repo size of a support library isn't normally a relevant metric, but
this is a valid point, as is its license:

   text	   data	    bss	    dec	    hex	filename
  80510	   1504	      8	  82022	  14066	/usr/lib64/liblmdb.so

-- 
Matthias Andree



More information about the bogofilter-dev mailing list