[cvs] bogofilter/src mime.c,1.33,1.34

Evgeny Kotsuba evgen at shatura.laser.ru
Tue Jan 4 10:11:38 CET 2005


Matthias Andree wrote:

>Evgeny Kotsuba <evgen at shatura.laser.ru> writes:
>
>  
>
>>Well, I don't what to disput - as for me more readable is sizeof(),  you 
>>like strlen() - let it be, but then use constat int ;-)
>>I agree that this is very little effect on optimization.
>>On the other hand  I am not satisfied with 10 messages/sec at Athon and 
>>can't find were those inner layers for optimization are.
>>    
>>
>
>/Usually/ the limiting factor is I/O speed, and that you won't change
>with any optimization. CPUs have become pretty fast, sequential
>throughput of drives has improved, but random access is the
>bottleneck. Server disk drives rotate faster, with shorter strokes,
>making more noise, to improve the number of synchronous operations, and
>have more sophisticated queueing (SCSI tagged command queueing).
>
I use JFS and JFS cache is 200Mb, the data base is no more than 50Mb,  
so there  are no limiting disk  operation.  Medium  message size is 
about 10-20kb, so  200kb/sek input also is not limiting.

>To find out where the program spends its time, use a decent profiling
>tool, Linux has oprofile, Sun touts Solaris 10's DTrace, and there are
>other tools.
>
>Even on older machines, strace with timestamps enabled (-tt or -ttt) may
>give hints. If it spends a lot of time in open, read, write, fsync,
>close, ... you know optimizing the code will help nothing. In that case,
>only changing algorithms to reduce the number of synchronous I/O
>operations can help then.
>  
>
Well, I have found one thing - that on binary attachments all input 
strings are pass through all lexer and  decoding
Look at lexer.c -> yyinput()

//extern mime_t *msg_state;
  if(msg_state)
  {  if(msg_state->mime_disposition)
     {  if(msg_state->mime_type == MIME_APPLICATION ||  
msg_state->mime_type == MIME_IMAGE)
     return (count == EOF ? 0 : count);   //not decode at all
     }
  }

This is attempt to drop decoding, but really we need do it somewere 
before, just after string is read.

I have already ask about bogofilter's speed  in real environment, i.e. 
on per message basis - but nobody answered me.

SY,
EK




More information about the bogofilter-dev mailing list