.db rebuilds: comparing versions, and a note on formail

Greg Louis glouis at dynamicro.on.ca
Fri Jan 31 17:01:13 CET 2003


On 20030131 (Fri) at 1613:26 +0100, Matthias Andree wrote:

> >  
> > > I fear your huge .db files run DB out of buffers with its default buffer
> > > sizes, some DB buffer configuration might help big time here.
> 
> I tried with a database of 83,554 kBytes and contains 5.14 million words
> and 13,500 messages, I got that registered in unter one minute. Most of
> the time, bogofilter is at 100 % CPU, after it has read all of the data
> base and printed the # words/messages line, which takes like 44 seconds,
> it takes another 12 seconds to close the data base and flush all the
> data to disk.
> 
> $ buffer </tmp/junkmail -S250k | time bogofilter -nv
>       83554K
> # 5143887 words, 13541 messages
> 38.86user 5.12system 0:54.64elapsed 80%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (243major+1669minor)pagefaults 0swaps
> 
> However, this is not an apples-apples comparison, because I only have
> 160,000 distinct tokens now. *shrug* probably because there's less junk
> in it now (boundary lines, tokens that old versions extracted,
> whatever).

Odd.  I get 160,000 more tokens (half a million in all) in spamlist.db
with 0.10.1.x than with 0.8.0.  The two goodlists have about the same
number of tokens.

Any more ideas on buffering/cacheing that I could try?  I did post the
stats about an hour ago -- did you see them yet?

-- 
| G r e g  L o u i s          | gpg public key:      |
|   http://www.bgl.nu/~glouis |   finger greg at bgl.nu |
| Help free our mailboxes. Include                   |
|        http://wecanstopspam.org in your signature. |




More information about the bogofilter-dev mailing list