BerkeleyDB

Adrian Otto aotto at aotto.com
Thu Sep 19 07:46:06 CEST 2002


Gyepi,

Actually, what I had in mind is a generic abstraction, much like what your
header file describes. Normally, I would suggest an OO approach, but we are
already most of the way there with functional abstraction, we might just as
well finish it that way. Ideally, we should have an internal interface to
the data that's implementation independent. There should be a way to create
and return an object pointer, which is then passed to functions that will
operate on it. The goal is to have an implementation that does not require
expensive code changes in order to implement a different database. Ideally,
only these functions would change.

Dumping of the data should really be done with a separate import/export
utility. This way, we won't need to carry this code in bogofilter. Although
it's not much code, every little bit that we can keep out helps. It should
use the same interface, and the same source file that we put these functions
in. These functions should probably be in their own separate source file,
and the corresponding header included in bogofilter.c, so that this code
reuse is simplified.

We should use a consistent approach to do the following sorts of operations:
- open_database() for creating/returning the database object pointer
- get_word_value() for getting the value of a word (already exists)
- set_word_value() for setting the value of a word (already exists)
- increment_word_value() first get_word_value(), increment, then
set_word_value()
- decrement_word_value() first get_word_value(), decrement, then
set_word_value()
- close_database() for committing/closing the database object.

The names of the current functions don't appear to follow a consistent
pattern, which makes sense if you understand the history of bogofilter, but
the function naming should be evaluated to make sure that it all makes sense
still. For instance, we have a function named 'bogofilter()' which should
probably be named something more descriptive like evaluate_spamicity().

Also, it would be nice if each function had a big banner before it that
tells more about what the function is for, and what uses it for what
purpose. This will make the source quick and easy to understand for new
developers that may join the project in the future.

Anyone interested in tackling this (small) effort?

Thanks,

Adrian


> -----Original Message-----
> From: Gyepi SAM [mailto:gyepi at praxis-sw.com]
> Sent: Wednesday, September 18, 2002 8:02 PM
> To: bogofilter-dev at aotto.com
> Subject: Re: BerkeleyDB
>
>
> On Wed, Sep 18, 2002 at 07:28:43PM -0700, Adrian Otto wrote:
> > > ESR initially tried the autodaemon approach, which
> > > incurs that cost once, but decided the it was a bad idea and
> > > instead accepted my DB3 patch.
> > Once we have all the functionality in the system that we want, we should
> > revisit this issue, and use some of the very good suggestions
> that have been
> > discussed on this list. As long as we properly abstract the use of the
> > BerkeleyDB code so that it can be easily replaced in the
> future, it should
> > be safe to leave it in place.
> >
> > I do advocate a limited scope effort to make abstractions for
> get_word_value
> > and set_word_value types of operations. This is already present to some
> > degree, but could be abstracted a little bit more to make future changes
> > easier.
>
> I sent a patch to ESR about a week ago, which abstracts the
> database calls so that one could
> easily replace the database by implementating the interface. The
> patch will not apply
> cleanly to the current cvs version so I need to rework it. Here's
> the header file
> from my original implementation. I would certainly appreciate
> comments on this.
>
>
> /* API for bogofilter datastore. If you write a datastore, it
> MUST have this interface */
>
> #ifndef DATASTORE_H_GUARD
> #define DATASTORE_H_GUARD
>
> #include "bogofilter.h"
>
> /* Initialize datastore.
>    Return 0 on success, 1 otherwise.
>    On input, list->name, list->file, list->count_file are set to
> correct values.
>    On output, list->db and list->msgcount should be set to
> correct values on success.
> */
> int datastore_init(wordlist_t *);
>
> /* Increments freq for given key. */
> void datastore_increment(wordlist_t *, char *,wordprop_t *);
>
> /* Decrement count for a given word_prop_t, if it exists in the
> datastore. */
> void datastore_decrement(wordlist_t *, char *,wordprop_t *);
>
> /* get the count associated with a given word in a list */
> int datastore_getcount(wordlist_t *, char *);
>
> /* Allows the datastore to close files and clean up. */
> void  datastore_deinit(wordlist_t *);
>
> /* Dumps state of datastore to stdout */
> int datastore_dump(char * /* filename */);
>
> #endif
>
>



More information about the bogofilter-dev mailing list