garbage removal

Marek Kowal marek.kowal at portal.onet.pl
Fri May 9 08:36:51 CEST 2003


> Others have busier mail servers and bigger wordlists.  Perhaps they'll 
> share their observations with us.

We actually did some very thorough testing of bogofilter efficiency. We
loaded the db with additional spam/ham and observed, how this affects the
number of messages that are processed per second on single thread of
bogofilter (PIII 1GHz, 2GB RAM).

The "spamlist.db" very quickly stops growing in size with the number of
uploaded spam messages - apparently all spam looks the same ;-) On the other
hand, goodlist.db grows to any size you want it - it is just the matter of
feeding it with appropriate number of good messages.

Attached is the graph showing speed of the message analysis vs goodlist.db
size. 2MB corresponds to about 1900 good emails, while 11000 emails yelds
23MB goodlist.db (and 6MB spamlist.db). Labels are in polish, but the graph
is self-explaining, so you will make your way through ;-)

As you can see - the speed at goodlist.db of 1MB is about 110 e-mails/sec
and drops to about 50 e-emails/sec, when working on 23MB goodlist.db. It
seems to stabilize at that speed - probably hashing algorithms of DB3/DB4
are good enough at this db size to prevent the lookup time to decrease
further. Test emails had an average size of 40kb. Interesting, though, is
the fact that during all the test the CPU consumption never reached more
than 60% - apparently the IO involved blocks the process. Though we made
sure that all letters analysed are already in cache, so this must be rather
something else than wainting for the letter to be read from disc.

Tests were performed in bulk mode, each message was separate file.

Cheers,
Marek

-------------- next part --------------
A non-text attachment was scrubbed...
Name: gd[1].png
Type: application/octet-stream
Size: 6451 bytes
Desc: not available
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20030509/2cf5a15b/attachment.obj>


More information about the Bogofilter mailing list