qdbm tuning

Matthias Andree matthias.andree at gmx.de
Mon Sep 8 23:06:37 CEST 2003


Stefan Bellon <sbellon at sbellon.de> writes:

>> I don't think we should stuff that re-organization code into
>> db_set_setvalue, or is the dpbnum so cheap we can afford it?
>
> Yes, it's in O(1).
>
>> I'd think we'd better use db_close to reorganize. Objections?
>
> Yes. I register 100 MB of spam/ham in one go.

How many tokens in how many mails are that?

> This totally breaks down performance if I don't organize every know
> and then.

Makes me wonder if we should really make two calls into qdbm for each
token added or if it's sufficient to optimize once per mail registered
-- the latter will be easy as we'll have transactional interfaces
someday anyways.

> I'm just not sure whether a used/buckets ratio of 1.25 is a good
> threshold value. I know of papers that say you have to reorganize as
> soon as the ratio gets to 0.80.

Go figure ;-)

> You can calculate the problem for yourself. If, at the beginning are
> only 1913 bucktes available and I want to feed over 300000 words into
> the word lists, then this is no good.

And if we switched from depot to villa? :->

-- 
Matthias Andree

Encrypt your mail: my GnuPG key ID is 0x052E7D95




More information about the bogofilter-dev mailing list