benchmark

Tom Anderson tanderso at oac-design.com
Thu Feb 12 07:44:37 CET 2004


On Wed, 2004-02-11 at 19:10, David Relson wrote:
> You indicate a problem but don't give much info about your environment
> or what you're doing.  In order to fix anything I need more info about
> what is (or might be) wrong.

model name      : AMD-K6(tm) 3D processor
stepping        : 12
cpu MHz         : 448.220
cache size      : 64 KB

linux 2.2.16
gcc 2.95.2

This is average usage:
12:29am  up 138 days,  8:41,  4 users,  load average: 1.11, 1.28, 1.04
59 processes: 55 sleeping, 3 running, 0 zombie, 1 stopped
CPU states:  9.4% user, 32.4% system,  0.0% nice, 58.0% idle
Mem:  257644K av, 250872K used,   6772K free, 112248K shrd,  76348K buff
Swap: 131536K av,  15540K used, 115996K free  122924K cached

> How big is the mailbox you're registering?  How
> big is your wordlist?  What other factors do you think relate to the...
> You can expect to see a size reduction.  You may also see a performance
> improvement.

Ok, I did the optimization, and this is the result of registering the
same 53 emails via bfproxy both before and after:

Before:
-rw-r--r--   1 tanderso home     22331392 Feb 12 00:20 wordlist.db

53 emails found, containing 7640 lines total.
15234 words from 53 emails were registered.
Total running time was 147 wallclock secs, 24.44 CPU secs.
0.003 CPU secs/line, 0.461 CPU secs/email.
Bfproxy required 0.87 usr + 0.77 sys = 1.64 CPU secs.
Bogofilter required 2.97 usr + 19.83 sys = 22.8 CPU secs.

After:
-rw-r--r--   1 tanderso home     15384576 Feb 12 00:22 wordlist.db

53 emails found, containing 7640 lines total.
15234 words from 53 emails were registered.
Total running time was 123 wallclock secs, 22.68 CPU secs.
0.003 CPU secs/line, 0.428 CPU secs/email.
Bfproxy required 0.84 usr + 0.71 sys = 1.55 CPU secs.
Bogofilter required 2.95 usr + 18.18 sys = 21.13 CPU secs.


The size difference is fairly significant at roughly 1/3 reduction,
however the performance has not changed too much (~7%), and the ratio of
user to system time is basically the same.  Any differences are within a
reasonable deviation.  Also, the page faults (310 major/341 minor) when
running "time" from the command-line are essentially the same as well.

I'm not entirely sure what significance the page fault numbers would be
in causing an abnorally high system time.  From my understanding, page
faults are generated whenever the CPU requests a memory address that the
MMU cannot translate.  The page fault handler interrupts the CPU to do
virtual memory management.  Major faults involve swapping to/from disk,
while minor faults involve expanding/rearranging allocations in RAM. 
Surely, this could cause slow system response, however your example data
had tremendously more page faults than mine, but apparently did not
effect the performance.  Therefore, either my harddisk/bus is horribly
sluggish, or page faults are not causing the performance problem.

> You don't mention how big your wordlist is nor whether you're setting
> the db cachesize parameter.  That may help as well.

I do not maintain huge corpi of spam and ham from which to generate a
cachesize parameter.  Therefore, I do not set one.  Considering my
sys/usr ratio was roughly the same before and after
shrinking/restructuring my database, I'm thinking that this probably
wouldn't change much anyway.  Nonetheless, I played around with randomly
setting it between 0 and 240 (higher than which it wouldn't read the
wordlist).  The results were that the lower the better, with 0 providing
roughly the same numbers as without the -k, which I therefore assume is
the default.  The sys/usr ratio did change somewhat throughout the
range, but sys was always at least double and usually 4-10 times higher,
and the closer numbers only came when both were unreasonably high.  The
page faults (minor usually) also increased substantially with higher
cachesize numbers.

Although your and my example systems are both around 500MHz, I notice
yours is a PIII while mine is a K6.  Are there any Intel optimizations
made in the code or make files?  I think we've essentially eliminated
memory problems, right?

Does anyone else on this list have abnormally high system times?

Tom

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20040212/0aa2dcc8/attachment.sig>


More information about the Bogofilter mailing list