qdbm tuning

Stefan Bellon sbellon at sbellon.de
Tue Sep 9 07:13:12 CEST 2003


Matthias Andree wrote:
> Stefan Bellon <sbellon at sbellon.de> writes:

[snip]

> > Yes. I register 100 MB of spam/ham in one go.

> How many tokens in how many mails are that?

08 Sep 08:36:56 006 register-n, 162738 words, 18691 messages
08 Sep 09:16:54 006 register-s, 159930 words, 5650 messages

> > This totally breaks down performance if I don't organize every know
> > and then.

> Makes me wonder if we should really make two calls into qdbm for each
> token added or if it's sufficient to optimize once per mail registered
> -- the latter will be easy as we'll have transactional interfaces
> someday anyways.

Ok, once per mail would be ok as well. But once per batch seems very
wrong to me.

> > I'm just not sure whether a used/buckets ratio of 1.25 is a good
> > threshold value. I know of papers that say you have to reorganize as
> > soon as the ratio gets to 0.80.

> Go figure ;-)

Will do.

> > You can calculate the problem for yourself. If, at the beginning
> > are only 1913 bucktes available and I want to feed over 300000
> > words into the word lists, then this is no good.

> And if we switched from depot to villa? :->

Erm, villa is the one API that's not yet fully working under RISCOS. :-}
But I'm working on it.

-- 
Stefan Bellon




More information about the bogofilter-dev mailing list