.db rebuilds: comparing versions, and a note on formail

Matthias Andree matthias.andree at gmx.de
Fri Jan 31 14:10:44 CET 2003


On Fri, 31 Jan 2003, Greg Louis wrote:

> Matthias isn't having the same db troubles as I am, according to a
> recent posting, so I thought it might help if I gave some details:
> 
> Just finished rebuilding my spamlist.db with 0.10.1.4:
> 
> # time ./bogofilter -v -s -d /root/scratch </root/.bogofilter/spam_corpus 
> # 5868782 words, 14502 messages
> 
> real    13m24.497s
> user    0m55.840s
> sys     0m17.420s

Could you show the output of db_stat spamlist.db?

I fear your huge .db files run DB out of buffers with its default buffer
sizes, some DB buffer configuration might help big time here.

> I think it's the token count difference that matters.  The final token
> count in the goodlist is 174421.  I've got a quantum leap in
> registration time somewhere between three-and-a-half and five million
> tokens, both with db-3.3.17 and with db-4.1.25.  It shows up in
> classification too; messages of quite moderate size can take two or
> more seconds each to process, while 0.8.0 gets them done in a couple
> hundred milliseconds.

Might be the structure gets too many indirections then or we might need
to tune db for bigger cache sizes.




More information about the bogofilter-dev mailing list