cdb support

Greg Louis glouis at dynamicro.on.ca
Wed Jul 9 21:07:02 CEST 2003


On 20030709 (Wed) at 1825:54 +0200, Matthias Andree wrote:

> The datastore_cdb.c is also still missing, but it looks as though Greg
> is about to remedy this. :-)

Anyone in haste would do well not to wait for me ;)  I certainly want
to have a try at it, but unlike Gyepi and you, I'm new to this kind of
work and will advance rather slowly.  I'd hope to be compiling
(successfully, I mean) by this weekend, depending on the time I can get
for it, but till I actually start, that's just a hope, not an estimate.

BTW, last weekend I hacked (in the pejorative sense, ie a rush job with
little -- no, make that _no_ -- elegance) bogofilter-0.13.7.2 into
three parts (tokenizer, db lookup module and classifier -- didn't do
the registration module yet) for experimental purposes.  The tokenizer
is more or less the same as bogolexer -p, except that if you feed it
more than one message on stdin, it keeps going, emitting blank lines
between messages.  The lookup module takes the tokenizer's output on
stdin and emits what David calls message-count file format on stdout. 
And the classifier takes message-count file format on stdin and emits
message scores.  It's about a third slower than monolithic bogofilter,
but the ease with which one can swap out different modules makes it a
nice tool for testing.  (The classifier can fork tokenizer and lookup
modules so the experimenter doesn't have to type the pipe commands all
the time.)  I don't understand the lexer stuff, and there's a bit of
debugging to be done there -- there are a few non-ASCII characters I'm
dropping.  Other than that it works ok.  I'll probably implement a cdb
lookup module for this thing as practice before I try to add it into
bogofilter proper.

-- 
| G r e g  L o u i s          | gpg public key: finger     |
|   http://www.bgl.nu/~glouis |   glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |




More information about the bogofilter-dev mailing list