Wanting a pre-db4 bogofilter

Tom Allison tallison at tacocat.net
Thu Feb 24 13:34:42 CET 2005


Mark Constable wrote:
> This may seem odd but I'm rather perplexed at the recent
> versions of bogofilter since moving to db4 and transactions.
> 

I tend to agree with your observations.  The recent transactional switch 
has been rather strange.  Especially when you consider the additional 
approach of adding a seperate database connection to allow concurrent 
data sharing.

Here are some questions that come up when I look back on this:

Most people use bogofilter through a local mail delivery agent (MDA) 
similar to procmail and maildrop.  These have a built in flock option to 
prevent each user from doing concurrent processing which would put the 
bogofilter wordlist at risk of corruption.  So, if you are using 
procmail with file locking do you really need a transactional berkeley 
database behind bogofilter?  Most of the cases I've heard about with 
shared wordlists are read-only and not subject to transactional problems 
in most cases.  I personally have not had a problem, but I'm not a high 
volume user by some standards.

With the advent of a database back end being developed, the question 
comes up, "Why not choose a more familiar database with all the 
transactional requirements already provided?"  I don't know anything 
about sql-lite so I probably need to pull my foot out of my mouth on 
this one but as an example, postgresql is ACID compliant thereby 
providing all the transactional database protection that anyone could 
ever ask for.  And it does a great job sharing data between different 
hardware types, including Windows.

Granted, postgresql is a bit larger than berkeley, but if you are going 
to be using something in an environment that requires transactional 
locking (implying shared files and other issues that cannot be addressed 
by procmail flock) then you probably can afford the resource investment 
of a database client/server like postgresql.  (mysql too, but I'm 
unfamiliar with it).

I've also periodically ran into threads on daemonizing bogofilter. 
Wouldn't this fit in better with the ideas of shared wordlists, 
transactional databases, and really high performance spam filtering systems?

I don't know the answer to all these, but I wonder if the longer term 
approach would be to have two basic options:

bogofilter compiled to non-transactional berkeley database environment 
for those of us who have light servers, non-shared files, or are 
comfortable with flocking via maildrop/procmail.  This is essentially 
the requested "pre-db4" software version.

bogofilter compiled to an external database connection wherein the 
external database takes on all the responsibilities of concurrent data 
sharing and transactional locking as is expected of most databases in 
the world today.  This is essentially the up-coming sql-lite version 
that Mathias is working on.

This would seperate most of the issues that have been seen here (file 
space problems, installing berkeley utilities to manage bogofilter 
files....) into a database management problem and not within the realm 
of bogofilter.

I am guess that this is also a key component in shooting for a 
daemonized bogofilter application.  But that's probably going to come 
much later as we all have real lives to contend with.

Sorry to sound negative about bogofilter, it's clearly superior in it's 
accuracy and performance for me.  But I do have to recognize that there 
seems to be a step-function increase in complexity and administration 
overhead.

I like zero-admin features.

I use a paper-based planner because it's low maintenance.  No batteries. 
No software upgrades.  No crashes.  And you can drop it.
_______________________________________________
Bogofilter mailing list
Bogofilter at bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter



More information about the Bogofilter mailing list