multiple wordlists vs. BerkeleyDB environments

Matthias Andree matthias.andree at gmx.de
Mon Nov 15 01:33:37 CET 2004


Ann Arbor, we've got a problem!

Apart from the open RFC-2047 issue, the multiple wordlist isn't working
at this time except for multiple databases in the same directory and
within the same environment (= __db.* and log.* files) - and it cannot
be made to work without tipping the whole wordlists/datastore scheme as
it stands now if we want to offer system-wide databases.

The "same directory" is an implementation artifact and not a limitation
of the system - limitations are:

1 - for a database to be shared *in a consistent way* between
    applications, the applications must belong to and share *the same*
    environment.

2 - for consistency (i. e. not see bogus data in a reader while a writer
    is in progress), we need transactions (concurrent datastore will not
    suffice for lack of atomicity), unless we want to mutually lock
    readers and writers potentially for extended periods of time, so we
    cannot do without environment

3 - this poses the new interesting question: access control. system-wide
    databases would need to be writable for anyone - at least the
    environment, and users could wreak havoc at will. Not exactly what
    we want.

I see some ways out:

A - forget about shared wordlists, fix the "same directory" bug
    and move on; applications with access to the same environment trust
    each other implicitly.

B - full server-client model with access control or read-only access
    that cares for the consistency, perhaps delivering ready-made
    tokens. May need to be multithreaded, which opens a new can of worms
    labeled "POSIX threads" unless we want to use a fork() model which
    may imply awkward performance.

C - other suggestions?

B sounds too large for inclusion into 1.0, which must become ready some
day. A should be quick to implement though.

-- 
Matthias Andree



More information about the bogofilter-dev mailing list