.db rebuilds: comparing versions, and a note on formail
Matthias Andree
matthias.andree at gmx.de
Fri Jan 31 14:10:44 CET 2003
On Fri, 31 Jan 2003, Greg Louis wrote:
> Matthias isn't having the same db troubles as I am, according to a
> recent posting, so I thought it might help if I gave some details:
>
> Just finished rebuilding my spamlist.db with 0.10.1.4:
>
> # time ./bogofilter -v -s -d /root/scratch </root/.bogofilter/spam_corpus
> # 5868782 words, 14502 messages
>
> real 13m24.497s
> user 0m55.840s
> sys 0m17.420s
Could you show the output of db_stat spamlist.db?
I fear your huge .db files run DB out of buffers with its default buffer
sizes, some DB buffer configuration might help big time here.
> I think it's the token count difference that matters. The final token
> count in the goodlist is 174421. I've got a quantum leap in
> registration time somewhere between three-and-a-half and five million
> tokens, both with db-3.3.17 and with db-4.1.25. It shows up in
> classification too; messages of quite moderate size can take two or
> more seconds each to process, while 0.8.0 gets them done in a couple
> hundred milliseconds.
Might be the structure gets too many indirections then or we might need
to tune db for bigger cache sizes.
More information about the bogofilter-dev
mailing list