sbellon at sbellon.de
Tue Sep 9 01:13:12 EDT 2003
Matthias Andree wrote:
> Stefan Bellon <sbellon at sbellon.de> writes:
> > Yes. I register 100 MB of spam/ham in one go.
> How many tokens in how many mails are that?
08 Sep 08:36:56 006 register-n, 162738 words, 18691 messages
08 Sep 09:16:54 006 register-s, 159930 words, 5650 messages
> > This totally breaks down performance if I don't organize every know
> > and then.
> Makes me wonder if we should really make two calls into qdbm for each
> token added or if it's sufficient to optimize once per mail registered
> -- the latter will be easy as we'll have transactional interfaces
> someday anyways.
Ok, once per mail would be ok as well. But once per batch seems very
wrong to me.
> > I'm just not sure whether a used/buckets ratio of 1.25 is a good
> > threshold value. I know of papers that say you have to reorganize as
> > soon as the ratio gets to 0.80.
> Go figure ;-)
> > You can calculate the problem for yourself. If, at the beginning
> > are only 1913 bucktes available and I want to feed over 300000
> > words into the word lists, then this is no good.
> And if we switched from depot to villa? :->
Erm, villa is the one API that's not yet fully working under RISCOS. :-}
But I'm working on it.
More information about the Bogofilter-dev