simple timing test [was: [cvs] bogofilter/src mime.c]

David Relson relson at osagesoftware.com
Tue Jan 4 13:33:32 CET 2005


On Tue, 04 Jan 2005 12:11:38 +0300
Evgeny Kotsuba wrote:

> Matthias Andree wrote:
> 
> >Evgeny Kotsuba <evgen at shatura.laser.ru> writes:
> >
> >  
> >
> >>Well, I don't what to disput - as for me more readable is sizeof(),  you 
> >>like strlen() - let it be, but then use constat int ;-)
> >>I agree that this is very little effect on optimization.
> >>On the other hand  I am not satisfied with 10 messages/sec at Athon and 
> >>can't find were those inner layers for optimization are.
> >>    
> >>
> >
> >/Usually/ the limiting factor is I/O speed, and that you won't change
> >with any optimization. CPUs have become pretty fast, sequential
> >throughput of drives has improved, but random access is the
> >bottleneck. Server disk drives rotate faster, with shorter strokes,
> >making more noise, to improve the number of synchronous operations, and
> >have more sophisticated queueing (SCSI tagged command queueing).
> >
> I use JFS and JFS cache is 200Mb, the data base is no more than 50Mb,  
> so there  are no limiting disk  operation.  Medium  message size is 
> about 10-20kb, so  200kb/sek input also is not limiting.
> 
> >To find out where the program spends its time, use a decent profiling
> >tool, Linux has oprofile, Sun touts Solaris 10's DTrace, and there are
> >other tools.
> >
> >Even on older machines, strace with timestamps enabled (-tt or -ttt) may
> >give hints. If it spends a lot of time in open, read, write, fsync,
> >close, ... you know optimizing the code will help nothing. In that case,
> >only changing algorithms to reduce the number of synchronous I/O
> >operations can help then.
> >  
> >
> Well, I have found one thing - that on binary attachments all input 
> strings are pass through all lexer and  decoding
> Look at lexer.c -> yyinput()
> 
> //extern mime_t *msg_state;
>   if(msg_state)
>   {  if(msg_state->mime_disposition)
>      {  if(msg_state->mime_type == MIME_APPLICATION ||  
> msg_state->mime_type == MIME_IMAGE)
>      return (count == EOF ? 0 : count);   //not decode at all
>      }
>   }
> 
> This is attempt to drop decoding, but really we need do it somewere 
> before, just after string is read.
> 
> I have already ask about bogofilter's speed  in real environment, i.e. 
> on per message basis - but nobody answered me.
> 
> SY,
> EK

Evgeny,

Using a for loop and the time utility, there are simple timing tests
that can be run with the two mbox files included in "make check", i.e.
tests/inputs/spam.mbx and tests/inputs/good.mbx:

Here's the output from my Athlon 2500 running gcc and linux:

[relson at osage src]$ grep -c "^From " tests/inputs/????.mbx
tests/inputs/good.mbx:48
tests/inputs/spam.mbx:21

[relson at osage src] for N in tests/inputs/????.mbx ; do time bogofilter -D -M -I $N ; done
Command exited with non-zero status 1
0.26user 0.07system 0:00.36elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1916minor)pagefaults 0swaps
0.12user 0.02system 0:00.16elapsed 91%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1855minor)pagefaults 0swaps

[relson at osage src]$ for N in tests/inputs/????.mbx ; do time -p bogofilter -D -M -I $N ; done
Command exited with non-zero status 1
real 0.33
user 0.26
sys 0.06
real 0.15
user 0.12
sys 0.03

So I show 0.33 sec for the 48 good messages and 0.15 sec for the 21 spam messages.

David



More information about the bogofilter-dev mailing list