cdb preliminary results

Greg Louis glouis at dynamicro.on.ca
Fri Jul 11 12:47:08 CEST 2003


On 20030711 (Fri) at 0245:21 +0200, Matthias Andree wrote:
> People would use batch mode updates for these data bases.
> 
> If we were to support that, we'd rather write change instructions in
> append-mode to a file, and at some time, merge these change instructions
> into the source for cdb and recompile the cdb.

I'd been thinking in terms of building a hash in memory, then reading
the old .cdb file and mergeing the changes into a new .cdb, and finally
traversing the hash to add any new records.  Clearly not something
you'd do for one message at a time -- batch mode only.  That way saves
having to maintain a separate source file, though you'd still need
double the disk space while building the new .cdb.

WRT dbt_v vs ASCII: for my experimental multiprocess version ASCII is
actually faster for reading, because the data get piped to the
classifier in msg-count format -- storing them in ASCII saves s
conversion.  In monolithic bogofilter, it adds a conversion, so there
your point about speed is well taken.

-- 
| G r e g  L o u i s          | gpg public key: finger     |
|   http://www.bgl.nu/~glouis |   glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |




More information about the bogofilter-dev mailing list