bogofilter speed [was: garbage removal]

David Relson relson at osagesoftware.com
Fri May 9 19:11:47 CEST 2003


Marek,

Nice job of testing and graphing.  I might have labeled the horizontal axis 
in KB/MB, i.e. 500K, 5.9M, 11.7M, 17.5M, 23.4M (or whatever).

There's another factor you might want to test and that's BerkeleyDB's cache 
size.  We've found that the cache size can help performance.  You might 
find it interesting to increase the cache size and see if it affects your 
performance.  Code like the following would do it:

CFG="test.cfg"

for size in 0 2 4 8 16 ; do
cat <<Eof >$CFG
db_cachesize=$size
Eof

bogofilter -c $CFG ...
done

Greg Louis and I have been doing some testing to see what happens when the 
two wordlists (spamlist.db and goodlist.db) are combined in one wordlist, 
i.e. wordlist.db.  Using BerkeleyDB's default cache (256k) works poorly 
with a combined wordlist.  With a sufficiently large cache, the combined 
wordlist outperforms the separate wordlists.  For example on my PIII-500 
scoring 17,083 messages in a 90MB mbox file takes about 168 seconds with 
separate wordlists.  The combined wordlist takes 280 seconds with default 
cache size, but that drops to 150 seconds with a 6MB cache, and to 144 with 
a 10MB cache.

Like I said, it'd be interesting if you'd test how cache size affects your 
performance.

David





More information about the Bogofilter mailing list