simple timing test [was: [cvs] bogofilter/src mime.c]
David Relson
relson at osagesoftware.com
Tue Jan 4 13:33:32 CET 2005
On Tue, 04 Jan 2005 12:11:38 +0300
Evgeny Kotsuba wrote:
> Matthias Andree wrote:
>
> >Evgeny Kotsuba <evgen at shatura.laser.ru> writes:
> >
> >
> >
> >>Well, I don't what to disput - as for me more readable is sizeof(), you
> >>like strlen() - let it be, but then use constat int ;-)
> >>I agree that this is very little effect on optimization.
> >>On the other hand I am not satisfied with 10 messages/sec at Athon and
> >>can't find were those inner layers for optimization are.
> >>
> >>
> >
> >/Usually/ the limiting factor is I/O speed, and that you won't change
> >with any optimization. CPUs have become pretty fast, sequential
> >throughput of drives has improved, but random access is the
> >bottleneck. Server disk drives rotate faster, with shorter strokes,
> >making more noise, to improve the number of synchronous operations, and
> >have more sophisticated queueing (SCSI tagged command queueing).
> >
> I use JFS and JFS cache is 200Mb, the data base is no more than 50Mb,
> so there are no limiting disk operation. Medium message size is
> about 10-20kb, so 200kb/sek input also is not limiting.
>
> >To find out where the program spends its time, use a decent profiling
> >tool, Linux has oprofile, Sun touts Solaris 10's DTrace, and there are
> >other tools.
> >
> >Even on older machines, strace with timestamps enabled (-tt or -ttt) may
> >give hints. If it spends a lot of time in open, read, write, fsync,
> >close, ... you know optimizing the code will help nothing. In that case,
> >only changing algorithms to reduce the number of synchronous I/O
> >operations can help then.
> >
> >
> Well, I have found one thing - that on binary attachments all input
> strings are pass through all lexer and decoding
> Look at lexer.c -> yyinput()
>
> //extern mime_t *msg_state;
> if(msg_state)
> { if(msg_state->mime_disposition)
> { if(msg_state->mime_type == MIME_APPLICATION ||
> msg_state->mime_type == MIME_IMAGE)
> return (count == EOF ? 0 : count); //not decode at all
> }
> }
>
> This is attempt to drop decoding, but really we need do it somewere
> before, just after string is read.
>
> I have already ask about bogofilter's speed in real environment, i.e.
> on per message basis - but nobody answered me.
>
> SY,
> EK
Evgeny,
Using a for loop and the time utility, there are simple timing tests
that can be run with the two mbox files included in "make check", i.e.
tests/inputs/spam.mbx and tests/inputs/good.mbx:
Here's the output from my Athlon 2500 running gcc and linux:
[relson at osage src]$ grep -c "^From " tests/inputs/????.mbx
tests/inputs/good.mbx:48
tests/inputs/spam.mbx:21
[relson at osage src] for N in tests/inputs/????.mbx ; do time bogofilter -D -M -I $N ; done
Command exited with non-zero status 1
0.26user 0.07system 0:00.36elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1916minor)pagefaults 0swaps
0.12user 0.02system 0:00.16elapsed 91%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1855minor)pagefaults 0swaps
[relson at osage src]$ for N in tests/inputs/????.mbx ; do time -p bogofilter -D -M -I $N ; done
Command exited with non-zero status 1
real 0.33
user 0.26
sys 0.06
real 0.15
user 0.12
sys 0.03
So I show 0.33 sec for the 48 good messages and 0.15 sec for the 21 spam messages.
David
More information about the bogofilter-dev
mailing list