bogofilter now has tdb support

Gyepi SAM gyepi at praxis-sw.com
Sat Jul 5 22:01:31 CEST 2003


Greetings!

I have added trivial database (http://sourceforge.net/projects/tdb)
support to bogofilter. The Berkeley DB backend has never truly pleased me,
and it pleases me less whenever I find my databases corrupted for the nth time.

Since I proposed the whole datastore concept, I also felt that it would
only be fitting to test the idea with a second implementation, which
will hopefully be faster, easier to use, and less corruptible.

The initial code is available in cvs under the 'datastore_tdb' branch.
It should build fine, if you had tdb installed in a standard location.

If you are interested in using it, I would recommend checking out a
fresh copy to avoid confusion with your regular bogofilter setup.

To build a tdb bogofilter, pass  '--with-tdb' to configure.
Note that BDB is not used in that instance.

Some areas that need further work/thought/discussion include:

1. datastore_db.c contained a lot of generic datastore code, which has
now been moved into datastore.c. We should take a look at the current
design and determine whether it is the best way.  I am not fond of the
db_getvalue, db_get_dbvalue and db_setvalue, db_set_dbvalue pairs. They
obscure the code path and are convoluted. Ideally.

2. Ideally, one should be able to build in both databases and choose the
preferred one at run time, either by specifying it at the command line
or based on the database filename.

This would probably require the use of a dynamic link loader (man dlopen)
in addition to allowing the datastore to decide on the database
filename, or at the very least, file extension. Barring that, we'd need
do determine the database type using some kind of magic.

This would solve the problem of trying to figure out which database is
currently supported since the binaries have the same names.

3. The resolution to #3 should solve the problem of the rpm package
   names too.

4. configure.ac needs to be hacked further to allow the user to specify
the path to the (non-standard) tdb installation directory.

5. We need a pair of generic pack/unpack utilities to replace the use of
the cv[2] array. Using an array is rather kludgey and inflexible. I can
see the day when we may want to store multiple fields in each row.

5a. If we do #5, we should also store a pseudo row like .MSG_COUNT to
hold the pack template. ie. .PACK_TEMPLATE would currently be '%d%d'.
which means that each record is packed with two ints. That way we can
add new columns very easily as long as each addition has a default
value.

I will be away from the computer till Monday  but hoped to take
advantage of your copious free cycles during the weekend ;)

I know there's a log to chew on here. Feel free focus responses on
specific parts.

enjoy

-Gyepi





More information about the bogofilter-dev mailing list