DB backend support for lmdb?

Steffen Nurpmeso steffen at sdaoden.eu
Fri Jul 20 17:45:43 CEST 2018


Hello nochmal!

Matthias Andree wrote in <c4f77995-7556-9fe0-e7e7-e6470fa6f715 at gmx.de>:
 |Am 19.07.2018 um 22:22 schrieb Steffen Nurpmeso:
 |> Steffen Nurpmeso wrote in <20180719193041.33VX0%steffen at sdaoden.eu>:
 |>|Matthias Andree wrote in <9a51593a-a5ce-06f3-6070-af25b1a6db34 at an3e.de>:
 |>||Am 18.07.2018 um 01:19 schrieb Matthias Andree:
 ...
 |>|All tests pass for me now.
 |> 
 |> Yes, but there is one little error nonetheless, as below.
 |> Thank you, Matthias.  I have just installed bogofilter/LMDB in ~/!
 ...
 |I have updated the code with your file, this patch, and merged things
 |back into the trunk. I have also added a few warning fixes, and I think
 |one of the warnings flagged by GCC was a string comparison bug that used
 |a s1 == s2 on the pointers, I have made that a 0 == strcmp(s1, s2).

Compilation with C++ compiler, i have not done this for years!
I did this too (only compiled with C++ compiler), but C++ became
so fat, and the MUA i maintain was an absolute C++ no-go, and it
is so easy to simply assign from void* to any.  But cool.

And that is a bug!  In the patch below, and as below, i added two
explicit rodata symbols, and reintroduced the plain pointer
comparison, i hope this is ok.

 |Please check out the latest bogofilter SVN from the trunk.
 |(Regarding sourceforget.net speed, I hope convert things to Git this year.)
 |
 |The trunk now seems fine WRT LMDB support on Fedora 28 and FreeBSD 11.2.

Great!  I am at r7076, thus!

 --End of <c4f77995-7556-9fe0-e7e7-e6470fa6f715 at gmx.de>

Matthias Andree wrote in <5b642383-5f44-8b9f-eb66-bf0e8ab52f1a at gmx.de>:
 |Am 18.07.2018 um 01:56 schrieb Steffen Nurpmeso:
 |> which could be used to work around this what i still think is
 |> a bug in LMDB, get rid of the complete replay log implementation
 |> and simply call abort() whenever the MDB_MAP_FULL error happens.
 |> This would surely improve performance, too.
 |
 |You have more experience with the LMDB driver - I am not acquainted with
 |it and thus not much of a guide...

Of course, i know this thoroughly for.. it is day 6!  (But if the
devil is 6 then god .. is 7.  This monkey's -gone- going to heaven.)

 |> Ok i will do this tomorrow.  Great!  Sourceforge now needs many
 |> minutes to create a zip snapshot.  I should boot FreeBSD and use
 |> svn, that would be better maybe.
 |
 |Or install subversion on the other OS.

Oh, i have totally forgotten how to use subversion, and i really
would like to avoid this!  I do not even have git-svn no more, not
a single entry in my "external code" arena uses subversion no
more.  (Those which did i converted to tar_bomb_git a.k.a. release
checkouts: i seem to remember that git-svn caused a lot of hanging
connections and such.  And then i also seem to remember that this
was also caused because modern subversion changed the internal
layout to maaaany files, which was a massive pain on my old 3600
rpm HD Mac Notebook.)

 |> What do you think of the 1 GB initial size?  Or even 2?  It should
 |> not hurt except for virtual size, at least optionally, with
 |> a preprocessor switch?
 |
 |I don't have much of an opinion on this - just wondering what happens if
 |you are running several parallel copies of bogofilter on a 32-bit
 |computer. Some people still use i386 for their mail servers :-)

Terrible!  I always get that wrong you know -- and isn't it just
terrible that ISO C defines that a plain "u" suffix scales the
constant to whatever size is needed, but if i, on the other hand,
say "1u<<X", then the resulting constant, and it is a constant,
needs a "lu" or even a "llu" suffix in order to end the desired
way!  I had to test whether 0x00000001u would actually work..
likely not.  Thanks for finding this!

So please find below another update.  I surely should have waited
until the state we have today before i sent out the patch for the
first time, i apologise for that.  Some clarifying comments, code
reorders, fit into 79 columns, we do no longer include unnecessary
headers, and the a_BFLM_FIXED_SIZE constant should now work
graceful also on 32-bit hosts: i guess going for a 42-bit file
size on >32-bit hosts, and for 31-bit otherwise, is ok?

And i agree, and did not make a_BFLM_FIXED_SIZE the default.  It
is also because i have no overview of operating systems other than
BSD and Linux, a little bit of Solaris, and i do not know how
graceful they, and their VFS implementation, handle such things.
But i have increased a_BFLM_GROW to ~16 MB, too.  And we test
whether we would overflow size_t when growing the DB, somehow.

Ciao, Matthias!

diff --git a/bogofilter/src/datastore_lmdb.c b/bogofilter/src/datastore_lmdb.c
index 949340e..b8c4015 100644
--- a/bogofilter/src/datastore_lmdb.c
+++ b/bogofilter/src/datastore_lmdb.c
@@ -8,7 +8,7 @@
  * Steffen Nurpmeso <steffen at sdaoden.eu>    2018
  * (copied from datastore_kc.c:
  * Gyepi Sam <gyepi at praxis-sw.com>          2003
- * Matthias Andree <matthias.andree at gmx.de> 2003
+ * Matthias Andree <matthias.andree at gmx.de> 2003, 2018
  * Stefan Bellon <sbellon at sbellon.de>       2003-2004
  * Pierre Habouzit <madcoder at debian.org>    2007
  * Denny Lin <dennylin93 at hs.ntnu.edu.tw>    2015)
@@ -41,18 +41,16 @@
  */
 
 /* Alternative implementation: fixed DB size */
-/*#define a_BFLM_FIXED_SIZE (1u << 31)*/
+/*#define a_BFLM_FIXED_SIZE (ULONG_MAX >> (ULONG_MAX != UINT_MAX ? 22 : 1))*/
 
 /* mdb_env_set_maxreaders() */
-#define a_BFLM_MAXREADERS 15
+#define a_BFLM_MAXREADERS 7
 
 #ifndef a_BFLM_FIXED_SIZE
-    /* DB size grow.  Must be a power of two!
+    /* DB size grow.  Must be a power of two (we perform alignment)!
      * Space it so that a DB load does not run against walls too many times.
-     * We try _TRIES times to resize for a single new entry before giving up.
-     * Note that the minimum size must be capable to store the two used DB
-     * structured without resize etc. */
-# define a_BFLM_GROW (1u << 23)
+     * We try _TRIES times to resize for a single new entry before giving up */
+# define a_BFLM_GROW (1u << 24)
 # define a_BFLM_GROW_TRIES 3
 
     /* Size of one chunk of the intermediate txn cache, as above.
@@ -60,7 +58,7 @@
      * Of course, if a token requires more space, we allocate a larger chunk */
 # define a_BFLM_TXN_CACHE_SIZE (1u << 20)
 
-    /* An entry consists of an uint32_t describing the length of the key.
+    /* A cache entry consists of an uint32_t describing the length of the key.
      * If the high bit is set an uint32_t describing the length of the value
      * follows.  After the data buffers there possibly is alignment pad */
 # define a_BFLM_TXN_CACHE_ALIGN(X) \
@@ -74,6 +72,7 @@
 #include "common.h"
 
 #include <errno.h>
+#include <limits.h>
 
 #include <lmdb.h>
 
@@ -82,7 +81,6 @@
 #include "error.h"
 #include "paths.h"
 #include "xmalloc.h"
-#include "xstrdup.h"
 
 #if MDB_VERSION_FULL < MDB_VERINT(0, 9, 22)
 # error "Required LMDB version: 0.9.22 or later (0.9.11 may do, but untested)"
@@ -114,7 +112,7 @@ struct a_bflm{
 
 #ifndef a_BFLM_FIXED_SIZE
 struct a_bflm_txn_cache{
-    struct a_bflm_txn_cache *bflmtc_last;
+    struct a_bflm_txn_cache *bflmtc_last;   /* Up-to-date (stack usage) */
     struct a_bflm_txn_cache *bflmtc_next;   /* Needs to be build before use! */
     char *bflmtc_caster;    /* Current caster */
     char *bflmtc_max;       /* Maximum usable byte, exclusive */
@@ -123,6 +121,9 @@ struct a_bflm_txn_cache{
 };
 #endif
 
+static char const a_bflm_db_name_man[] = a_BFLM_DB_NAME_MAN;
+static char const a_bflm_db_name_dat[] = a_BFLM_DB_NAME_DAT;
+
 /**/
 static struct a_bflm *a_bflm_init(bfpath *bfp, bool rdonly);
 static int a_bflm__check_create(struct a_bflm *bflmp);
@@ -134,18 +135,18 @@ static int a_bflm_txn_abort(void *vhandle);
 static int a_bflm_txn_commit(void *vhandle);
 
 #ifndef a_BFLM_FIXED_SIZE
-/**/
+/* A transaction needs to be resized and all modifications in the cache need to
+ * be replayed, because we have seen MDB_MAP_FULL (or MDB_MAP_RESIZED) */
 static bool a_bflm_txn_mapfull(struct a_bflm *bflmp, bool close_cursor);
 
+/* (NULL on success or an error message otherwise) */
+static char const *a_bflm_txn__replay(struct a_bflm *bflmp);
+
 /* Put an entry; it is a deletion if val_or_null is NULL.
  * Return NULL on success or an error message otherwise */
 static char const *a_bflm_txn_cache_put(struct a_bflm *bflmp, MDB_val *key,
                     MDB_val *val_or_null);
 
-/* Replay all the cache operations in order to redo the transaction.
- * Return NULL on success or an error message otherwise */
-static char const *a_bflm_txn_cache_replay(struct a_bflm *bflmp);
-
 /* Free the recovery stack and possible heap data */
 static void a_bflm_txn_cache_free(struct a_bflm *bflmp);
 #endif /* a_BFLM_FIXED_SIZE */
@@ -192,7 +193,7 @@ a_bflm_init(bfpath *bfp, bool rdonly){
     memcpy(rv->bflm_filepath = (char*)&rv[1], bfp->filepath, i);
 
     rv->bflm_flags = (((DEBUG_DATABASE(1) || getenv("BF_DEBUG_DB") != NULL)
-            ? a_BFLM_DEBUG : a_BFLM_NONE) |
+                ? a_BFLM_DEBUG : a_BFLM_NONE) |
             (rdonly ? a_BFLM_RDONLY : a_BFLM_NONE));
 
     e = mdb_env_create(&rv->bflm_env);
@@ -204,7 +205,7 @@ a_bflm_init(bfpath *bfp, bool rdonly){
     rv->bflm_maxkeysize = mdb_env_get_maxkeysize(rv->bflm_env);
 
     /* To acommodate with bogofilter's db_created() mechanism we cannot use the
-     * unnamed DB which "always exists", but must place data in a named one */
+     * unnamed DB which "always exists", but must place data in named ones */
     e = mdb_env_set_maxdbs(rv->bflm_env, 2);
     if(e != MDB_SUCCESS){
         emsg = "mdb_env_set_maxdbs()";
@@ -259,6 +260,7 @@ a_bflm__check_create(struct a_bflm *bflmp){
     char const *db_name;
     unsigned int f;
     int e;
+    /* TODO compile-time-assert that MDB_CREATE is not 0 */
 
 jredo_txn:
     e = mdb_txn_begin(bflmp->bflm_env, NULL, 0, &bflmp->bflm_txn);
@@ -270,7 +272,7 @@ jredo_txn:
         goto jleave;
     }
 
-    db_name = a_BFLM_DB_NAME_MAN;
+    db_name = a_bflm_db_name_man;
     f = 0;
 jredo_dbi:
     e = mdb_dbi_open(bflmp->bflm_txn, db_name, f, &bflmp->bflm_dbi);
@@ -283,8 +285,8 @@ jredo_dbi:
         goto jerr;
     }
 
-    if(f == MDB_CREATE && 0 == strcmp(db_name, a_BFLM_DB_NAME_MAN)){
-        db_name = a_BFLM_DB_NAME_DAT;
+    if(f == MDB_CREATE && db_name == a_bflm_db_name_man){
+        db_name = a_bflm_db_name_dat;
         goto jredo_dbi;
     }
 
@@ -346,9 +348,9 @@ jredo_txn:
         goto jerr1;
     }
 
-    e = mdb_dbi_open(bflmp->bflm_txn, a_BFLM_DB_NAME_DAT, 0, &bflmp->bflm_dbi);
+    e = mdb_dbi_open(bflmp->bflm_txn, a_bflm_db_name_dat, 0, &bflmp->bflm_dbi);
     if(e != MDB_SUCCESS){
-        if(e == MDB_NOTFOUND){
+        if(e == MDB_NOTFOUND && (bflmp->bflm_flags & a_BFLM_RDONLY)){
             bflmp->bflm_flags |= a_BFLM_DB_UNAVAIL;
             goto junavail;
         }
@@ -476,12 +478,20 @@ a_bflm_txn_mapfull(struct a_bflm *bflmp, bool close_cursor){
 
     mdb_txn_abort(bflmp->bflm_txn);
 
-    /* Resize map */
+    /* Resize map.  To be super-safe, synchronize current map size first */
 jredo_txn:
+    mdb_env_set_mapsize(bflmp->bflm_env, 0);
     /* no error defined */mdb_env_info(bflmp->bflm_env, &envinfo);
     i = envinfo.me_mapsize;
-    i += a_BFLM_GROW / 10;
-    i = (i + (a_BFLM_GROW - 1)) & ~(a_BFLM_GROW - 1);
+    if((size_t)-1 - i >= a_BFLM_GROW * 2){
+        i += a_BFLM_GROW / 10;
+        i = (i + (a_BFLM_GROW - 1)) & ~(a_BFLM_GROW - 1);
+    }else if((size_t)-1 - i >= 1024u * 1024u * 2)
+        i = (size_t)-1 - (1024u * 1024u - 1);
+    else{
+        emsg = "DB size too large";
+        goto jerr1;
+    }
     e = mdb_env_set_mapsize(bflmp->bflm_env, i);
     if(e != MDB_SUCCESS){
         emsg = "mdb_env_set_mapsize()";
@@ -491,15 +501,13 @@ jredo_txn:
     /* Recreate transaction */
     e = mdb_txn_begin(bflmp->bflm_env, NULL, 0, &bflmp->bflm_txn);
     if(e != MDB_SUCCESS){
-        if(e == MDB_MAP_RESIZED){
-            mdb_env_set_mapsize(bflmp->bflm_env, 0);
+        if(e == MDB_MAP_RESIZED)
             goto jredo_txn;
-        }
         emsg = "mdb_txn_begin()";
         goto jerr1;
     }
 
-    e = mdb_dbi_open(bflmp->bflm_txn, a_BFLM_DB_NAME_DAT, 0, &bflmp->bflm_dbi);
+    e = mdb_dbi_open(bflmp->bflm_txn, a_bflm_db_name_dat, 0, &bflmp->bflm_dbi);
     if(e != MDB_SUCCESS){
         emsg = "mdb_dbi_open()";
         goto jerr2;
@@ -511,7 +519,7 @@ jredo_txn:
         goto jerr2;
     }
 
-    if((emsg = a_bflm_txn_cache_replay(bflmp)) != NULL)
+    if((emsg = a_bflm_txn__replay(bflmp)) != NULL)
         goto jerr3;
 
     if(bflmp->bflm_flags & a_BFLM_DEBUG)
@@ -533,6 +541,72 @@ jerr1:
     goto jleave;
 }
 
+static char const *
+a_bflm_txn__replay(struct a_bflm *bflmp){
+    MDB_val key, val;
+    char const *emsg;
+    int e;
+    uint32_t kl, vl;
+    char *dp;
+    struct a_bflm_txn_cache *head, *bflmtcp;
+
+    /* First of all create a list in the right order */
+    for(head = NULL, bflmtcp = bflmp->bflm_txn_cache; bflmtcp != NULL;
+            bflmtcp = bflmtcp->bflmtc_last){
+        bflmtcp->bflmtc_next = head;
+        head = bflmtcp;
+    }
+
+    /* Then replay, using it */
+    for(; head != NULL; head = head->bflmtc_next){
+        for(dp = head->bflmtc_data; dp < head->bflmtc_caster;){
+            bool isins;
+
+            /* For actual loading always use memcpy() for simplicity.
+             * (That is: C standard and undefined behaviour, who knows?) */
+            memcpy(&kl, dp, sizeof kl);
+            dp += sizeof kl;
+            if((isins = ((kl & 0x80000000u) != 0))){
+                kl ^= 0x80000000u;
+                memcpy(&vl, dp, sizeof vl);
+                dp += sizeof vl;
+            }
+
+            key.mv_size = kl;
+            key.mv_data = dp;
+            dp += kl;
+            if(isins){
+                val.mv_size = vl;
+                val.mv_data = dp;
+                dp += vl;
+
+                e = mdb_cursor_put(bflmp->bflm_cursor, &key, &val, 0);
+                if(e != MDB_SUCCESS){
+                    emsg = "mdb_cursor_put()";
+                    goto jleave;
+                }
+            }else{
+                e = mdb_cursor_get(bflmp->bflm_cursor, &key, NULL,
+                        MDB_SET_KEY);
+                if(e != MDB_SUCCESS){
+                    emsg = "mdb_cursor_get() for delete";
+                    goto jleave;
+                }
+                e = mdb_cursor_del(bflmp->bflm_cursor, 0);
+                if(e != MDB_SUCCESS){
+                    emsg = "mdb_cursor_del()";
+                    goto jleave;
+                }
+            }
+            dp = (char*)a_BFLM_TXN_CACHE_ALIGN((uintptr_t)dp);
+        }
+    }
+
+    emsg = NULL;
+jleave:
+    return emsg;
+}
+
 static char const *
 a_bflm_txn_cache_put(struct a_bflm *bflmp, MDB_val *key, MDB_val *val_or_null){
     uint32_t ui;
@@ -600,73 +674,6 @@ jleave:
     return emsg;
 }
 
-static char const *
-a_bflm_txn_cache_replay(struct a_bflm *bflmp){
-    /* And replay all the changes we have yet seen */
-    MDB_val key, val;
-    char const *emsg;
-    int e;
-    uint32_t kl, vl;
-    char *dp;
-    struct a_bflm_txn_cache *head, *bflmtcp;
-
-    /* First of all create a list in the right order */
-    for(head = NULL, bflmtcp = bflmp->bflm_txn_cache; bflmtcp != NULL;
-            bflmtcp = bflmtcp->bflmtc_last){
-        bflmtcp->bflmtc_next = head;
-        head = bflmtcp;
-    }
-
-    /* Then replay, using it */
-    for(; head != NULL; head = head->bflmtc_next){
-        for(dp = head->bflmtc_data; dp < head->bflmtc_caster;){
-            bool isins;
-
-            /* For actual loading always use memcpy() for simplicity.
-             * (That is: C standard and undefined behaviour, who knows?) */
-            memcpy(&kl, dp, sizeof kl);
-            dp += sizeof kl;
-            if((isins = ((kl & 0x80000000u) != 0))){
-                kl ^= 0x80000000u;
-                memcpy(&vl, dp, sizeof vl);
-                dp += sizeof vl;
-            }
-
-            key.mv_size = kl;
-            key.mv_data = dp;
-            dp += kl;
-            if(isins){
-                val.mv_size = vl;
-                val.mv_data = dp;
-                dp += vl;
-
-                e = mdb_cursor_put(bflmp->bflm_cursor, &key, &val, 0);
-                if(e != MDB_SUCCESS){
-                    emsg = "mdb_cursor_put()";
-                    goto jleave;
-                }
-            }else{
-                e = mdb_cursor_get(bflmp->bflm_cursor, &key, NULL,
-                        MDB_SET_KEY);
-                if(e != MDB_SUCCESS){
-                    emsg = "mdb_cursor_get() for delete";
-                    goto jleave;
-                }
-                e = mdb_cursor_del(bflmp->bflm_cursor, 0);
-                if(e != MDB_SUCCESS){
-                    emsg = "mdb_cursor_del()";
-                    goto jleave;
-                }
-            }
-            dp = (char*)a_BFLM_TXN_CACHE_ALIGN((uintptr_t)dp);
-        }
-    }
-
-    emsg = NULL;
-jleave:
-    return emsg;
-}
-
 static void
 a_bflm_txn_cache_free(struct a_bflm *bflmp){
     struct a_bflm_txn_cache *bflmtcp;
@@ -774,8 +781,8 @@ db_get_dbvalue(void *vhandle, const dbv_t *token, dbv_t *value){
 jleave:
     if(DEBUG_DATABASE(3))
         fprintf(dbgout, "LMDB db_get_dbvalue(): %lu <%.*s> -> %d\n",
-            (unsigned long)token->leng, (int)token->leng, (const char *)token->data,
-            (e == 0));
+            (unsigned long)token->leng, (int)token->leng,
+            (char const*)token->data, (e == 0));
     return e;
 jerr:
     if(e != MDB_NOTFOUND){
@@ -803,7 +810,7 @@ db_set_dbvalue(void *vhandle, const dbv_t *token, const dbv_t *value){
     if((bflmp = (struct a_bflm *)vhandle) == NULL)
         goto jleave;
 
-    if(bflmp->bflm_flags & a_BFLM_DB_UNAVAIL)
+    if(bflmp->bflm_flags & a_BFLM_DB_UNAVAIL) /* XXX assert instead */
         goto jleave;
 
     if((size_t)token->leng > bflmp->bflm_maxkeysize){
@@ -843,8 +850,8 @@ jredo:
 jleave:
     if(DEBUG_DATABASE(3))
         fprintf(dbgout, "LMDB db_set_dbvalue(): %lu <%.*s> -> %d\n",
-            (unsigned long)token->leng, (int)token->leng, (const char *)token->data,
-            (e == 0));
+            (unsigned long)token->leng, (int)token->leng,
+            (char const*)token->data, (e == 0));
     return e;
 jerr:
     print_error(__FILE__, __LINE__, "LMDB[%ld]: db_set_dbvalue(), %s: %d, %s",
@@ -912,8 +919,8 @@ jredo:
 jleave:
     if(DEBUG_DATABASE(3))
         fprintf(dbgout, "LMDB db_delete(): %lu <%.*s> -> %d\n",
-            (unsigned long)token->leng, (int)token->leng, (const char *)token->data,
-            (e == 0));
+            (unsigned long)token->leng, (int)token->leng,
+            (char const*)token->data, (e == 0));
     return e;
 jerr:
     print_error(__FILE__, __LINE__, "LMDB[%ld]: db_delete(), %s: %d, %s",

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


More information about the bogofilter-dev mailing list