multiple wordlists vs. BerkeleyDB environments
Matthias Andree
matthias.andree at gmx.de
Mon Nov 15 01:33:37 CET 2004
Ann Arbor, we've got a problem!
Apart from the open RFC-2047 issue, the multiple wordlist isn't working
at this time except for multiple databases in the same directory and
within the same environment (= __db.* and log.* files) - and it cannot
be made to work without tipping the whole wordlists/datastore scheme as
it stands now if we want to offer system-wide databases.
The "same directory" is an implementation artifact and not a limitation
of the system - limitations are:
1 - for a database to be shared *in a consistent way* between
applications, the applications must belong to and share *the same*
environment.
2 - for consistency (i. e. not see bogus data in a reader while a writer
is in progress), we need transactions (concurrent datastore will not
suffice for lack of atomicity), unless we want to mutually lock
readers and writers potentially for extended periods of time, so we
cannot do without environment
3 - this poses the new interesting question: access control. system-wide
databases would need to be writable for anyone - at least the
environment, and users could wreak havoc at will. Not exactly what
we want.
I see some ways out:
A - forget about shared wordlists, fix the "same directory" bug
and move on; applications with access to the same environment trust
each other implicitly.
B - full server-client model with access control or read-only access
that cares for the consistency, perhaps delivering ready-made
tokens. May need to be multithreaded, which opens a new can of worms
labeled "POSIX threads" unless we want to use a fork() model which
may imply awkward performance.
C - other suggestions?
B sounds too large for inclusion into 1.0, which must become ready some
day. A should be quick to implement though.
--
Matthias Andree
More information about the bogofilter-dev
mailing list