cdb support

Matthias Andree matthias.andree at gmx.de
Thu Jul 10 02:21:09 CEST 2003


Gustaf Erikson <gustafe at home.se> writes:

> Greg Louis <glouis at dynamicro.on.ca> writes:
>
>> I haven't looked at the free cdb implementation you mention, but if
>> it's adequate, we might as well avoid the hassle altogether and go with
>> that.
>
> May I ask what's wrong with Berkeley DB?
> This is an honest question, I would like to know the issues.

First of all: There are no plans to drop Berkeley DB, but to add a
choice and figure if some of the choices are of advantage at least for
certain setups.

People claim BerkeleyDB were fragile. There lies some truth in these
claims, because we don't use BerkeleyDB in transactional mode, so any
write error (disk full, I/O error) can potentially damage or corrupt the
data base, and I don't know how robust BerkeleyDB is towards kernel
crashes under write load.

Some people also suspect there are locking issues, but I don't track
these before I'm presented evidence or strong hints.

Personally, I postulate a "seed" theory. A venerable .db file that has
been written with older bogofilter versions (before the locking fix) may
have seen a corruption that doesn't show immediately, but slumbers for a
while (maybe weeks), but may grow to a real corruption later as the DB
is extended. I've seen bogofilter work well enough for weeks while at
the same time a regular dump (bogoutil -d) would go into an unterminated
loop and only db_dump -r or even -R would help (with -R seeing mirages).

Other people who train from "false-positive" and "false-negative"
mailboxes once a day (in that order of magnitude at least) asked if CDB
was feasible (it is) and if it was faster or no. It does lend itself to
the "static DB with enormous reads" idea. If it's faster in real use,
we'll see when the CDB implementation is there.

Back on the BerkeleyDB issue, AFAICS, BDB will perform write-ahead
logging in transactional mode, so it may actually be fast enough, so we
should try that one, too.

So this is practical research basically :-)

> As a user of OpenBSD, which takes licensing seriously, I would like
> bogofilter to use an open-source database backend (GPL or BSD
> license).

"Public domain" (tinycdb) should do, it's compatible with both of the
licenses you mention :-)

-- 
Matthias Andree




More information about the bogofilter-dev mailing list