databases
Tom Allison
tallison at tacocat.net
Wed Jun 30 03:28:49 CEST 2004
I know that bogofilter uses the btree berkeley database for retrieval.
I'm curious if there would be any performance advantage to changing this
to something like postgres. The only possible real advantage I can
think of is the removal many seperate physical databases for a large
central bogofilter system.
I guess the real question I should be asking is this:
What would it take to modify bogofilter such that it could be run as a
single daemon which could be accessed either from procmail/maildrop or
from something like amavisd-new?
I find bogofilter to have a better accuracy than spamassassin over time.
SA tends to have a sawtooth accuracy where it degrades significantly
right before each upgrade. Bogofilter starts out really stupid, but
learns quickly and after a point, becomes very consistent and adaptive
on the accuracy of spam filtering.
Because these are my experiences I'm leaping recklessly to the
conclusion that if bogofilter could somehow be included into a postfix
smtp process like amavisd-new, then it would be very easy to run
something like bogofilter and clamav as a postfix delivery process.
When you get into mail configurations of multi-domain hosting and
non-unix usernames (nothing in /etc/passwd) the implimentation of
bogofilter starts to get "tricky". One approach that I can conceive of
is to use discrete userid's for each domain and have that userid own the
bogofilter/procmail script for everyone on that domain. But there is
potentially some degredation in bogofilter accuracy when the users
number in the 100's to 1000's.
This is where I thing the process of grabbing the $LOGNAME from procmail
and tying that to a bogofilter wordlist is interesting. But that's
another email that hopefully someone else can answer.
Getting back to the first question, it might not be relevant to turn
bogofilter into some highly abstracted application like I originally
suggestion (rdms databases and deamons) if the "intent" can be well met
using existing methods.
More information about the Bogofilter
mailing list