Wanting a pre-db4 bogofilter
Tom Allison
tallison at tacocat.net
Thu Feb 24 13:34:42 CET 2005
Mark Constable wrote:
> This may seem odd but I'm rather perplexed at the recent
> versions of bogofilter since moving to db4 and transactions.
>
I tend to agree with your observations. The recent transactional switch
has been rather strange. Especially when you consider the additional
approach of adding a seperate database connection to allow concurrent
data sharing.
Here are some questions that come up when I look back on this:
Most people use bogofilter through a local mail delivery agent (MDA)
similar to procmail and maildrop. These have a built in flock option to
prevent each user from doing concurrent processing which would put the
bogofilter wordlist at risk of corruption. So, if you are using
procmail with file locking do you really need a transactional berkeley
database behind bogofilter? Most of the cases I've heard about with
shared wordlists are read-only and not subject to transactional problems
in most cases. I personally have not had a problem, but I'm not a high
volume user by some standards.
With the advent of a database back end being developed, the question
comes up, "Why not choose a more familiar database with all the
transactional requirements already provided?" I don't know anything
about sql-lite so I probably need to pull my foot out of my mouth on
this one but as an example, postgresql is ACID compliant thereby
providing all the transactional database protection that anyone could
ever ask for. And it does a great job sharing data between different
hardware types, including Windows.
Granted, postgresql is a bit larger than berkeley, but if you are going
to be using something in an environment that requires transactional
locking (implying shared files and other issues that cannot be addressed
by procmail flock) then you probably can afford the resource investment
of a database client/server like postgresql. (mysql too, but I'm
unfamiliar with it).
I've also periodically ran into threads on daemonizing bogofilter.
Wouldn't this fit in better with the ideas of shared wordlists,
transactional databases, and really high performance spam filtering systems?
I don't know the answer to all these, but I wonder if the longer term
approach would be to have two basic options:
bogofilter compiled to non-transactional berkeley database environment
for those of us who have light servers, non-shared files, or are
comfortable with flocking via maildrop/procmail. This is essentially
the requested "pre-db4" software version.
bogofilter compiled to an external database connection wherein the
external database takes on all the responsibilities of concurrent data
sharing and transactional locking as is expected of most databases in
the world today. This is essentially the up-coming sql-lite version
that Mathias is working on.
This would seperate most of the issues that have been seen here (file
space problems, installing berkeley utilities to manage bogofilter
files....) into a database management problem and not within the realm
of bogofilter.
I am guess that this is also a key component in shooting for a
daemonized bogofilter application. But that's probably going to come
much later as we all have real lives to contend with.
Sorry to sound negative about bogofilter, it's clearly superior in it's
accuracy and performance for me. But I do have to recognize that there
seems to be a step-function increase in complexity and administration
overhead.
I like zero-admin features.
I use a paper-based planner because it's low maintenance. No batteries.
No software upgrades. No crashes. And you can drop it.
_______________________________________________
Bogofilter mailing list
Bogofilter at bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter
More information about the Bogofilter
mailing list