databases

Tom Allison tallison at tacocat.net
Wed Jun 30 03:28:49 CEST 2004


I know that bogofilter uses the btree berkeley database for retrieval.
I'm curious if there would be any performance advantage to changing this 
to something like postgres.  The only possible real advantage I can 
think of is the removal many seperate physical databases for a large 
central bogofilter system.

I guess the real question I should be asking is this:

What would it take to modify bogofilter such that it could be run as a 
single daemon which could be accessed either from procmail/maildrop or 
from something like amavisd-new?

I find bogofilter to have a better accuracy than spamassassin over time. 
  SA tends to have a sawtooth accuracy where it degrades significantly 
right before each upgrade.  Bogofilter starts out really stupid, but 
learns quickly and after a point, becomes very consistent and adaptive 
on the accuracy of spam filtering.

Because these are my experiences I'm leaping recklessly to the 
conclusion that if bogofilter could somehow be included into a postfix 
smtp process like amavisd-new, then it would be very easy to run 
something like bogofilter and clamav as a postfix delivery process.

When you get into mail configurations of multi-domain hosting and 
non-unix usernames (nothing in /etc/passwd) the implimentation of 
bogofilter starts to get "tricky".  One approach that I can conceive of 
is to use discrete userid's for each domain and have that userid own the 
bogofilter/procmail script for everyone on that domain.  But there is 
potentially some degredation in bogofilter accuracy when the users 
number in the 100's to 1000's.

This is where I thing the process of grabbing the $LOGNAME from procmail 
and tying that to a bogofilter wordlist is interesting.  But that's 
another email that hopefully someone else can answer.

Getting back to the first question, it might not be relevant to turn 
bogofilter into some highly abstracted application like I originally 
suggestion (rdms databases and deamons) if the "intent" can be well met 
using existing methods.



More information about the Bogofilter mailing list