.db rebuilds: comparing versions, and a note on formail

Matthias Andree matthias.andree at gmx.de
Fri Jan 31 16:13:26 CET 2003


On Fri, 31 Jan 2003, Greg Louis wrote:

> > > # time ./bogofilter -v -s -d /root/scratch </root/.bogofilter/spam_corpus 
> > > # 5868782 words, 14502 messages
> > > 
> > > real    13m24.497s
> > > user    0m55.840s
> > > sys     0m17.420s
> > 
> > Could you show the output of db_stat spamlist.db?
>  
> > I fear your huge .db files run DB out of buffers with its default buffer
> > sizes, some DB buffer configuration might help big time here.

I tried with a database of 83,554 kBytes and contains 5.14 million words
and 13,500 messages, I got that registered in unter one minute. Most of
the time, bogofilter is at 100 % CPU, after it has read all of the data
base and printed the # words/messages line, which takes like 44 seconds,
it takes another 12 seconds to close the data base and flush all the
data to disk.

$ buffer </tmp/junkmail -S250k | time bogofilter -nv
      83554K
# 5143887 words, 13541 messages
38.86user 5.12system 0:54.64elapsed 80%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (243major+1669minor)pagefaults 0swaps

However, this is not an apples-apples comparison, because I only have
160,000 distinct tokens now. *shrug* probably because there's less junk
in it now (boundary lines, tokens that old versions extracted,
whatever).

AMD Duron 700, 320 MB RAM, 7200/min Ultra-160-SCSI drive.




More information about the bogofilter-dev mailing list