[cvs] bogofilter/src mime.c,1.33,1.34

Matthias Andree matthias.andree at gmx.de
Tue Jan 4 13:12:54 CET 2005


Evgeny Kotsuba <evgen at shatura.laser.ru> writes:

>>/Usually/ the limiting factor is I/O speed, and that you won't change
>>with any optimization. CPUs have become pretty fast, sequential
>>throughput of drives has improved, but random access is the
>>bottleneck. Server disk drives rotate faster, with shorter strokes,
>>making more noise, to improve the number of synchronous operations, and
>>have more sophisticated queueing (SCSI tagged command queueing).
>>
> I use JFS and JFS cache is 200Mb, the data base is no more than 50Mb,  
> so there  are no limiting disk  operation.  Medium  message size is 
> about 10-20kb, so  200kb/sek input also is not limiting.

The average access time of a drive, which is composed of the actual seek
time (move heads to right position, design decision and the longer the
more scattered the accesses are, for instance, 9 ms for a half stroke)
and the rotational latency (half the rotational period, which itself is
the reciprocal of the rotational frequency, for instance, 4.2 ms for a
7200/min drive) is the limiting factor on reads and on synchronous
writes. The OS and drive _MUST_ wait for the proper block for
synchronous writes, although caches can help somewhat with the read
performance.

I don't know what the JFS cache semantics on AIX or OS/2 are, on Linux,
the cache is automatically and dynamically allocated by the kernel
itself. The journal size is irrelevant for JFS, as it journals only meta
data, not file data.

>>Even on older machines, strace with timestamps enabled (-tt or -ttt) may
>>give hints. If it spends a lot of time in open, read, write, fsync,
>>close, ... you know optimizing the code will help nothing. In that case,
>>only changing algorithms to reduce the number of synchronous I/O
>>operations can help then.
>
> Well, I have found one thing - that on binary attachments all input 
> strings are pass through all lexer and  decoding
> Look at lexer.c -> yyinput()

Will investigate. I presume there are some bugs in mime.c left.

> I have already ask about bogofilter's speed  in real environment, i.e. 
> on per message basis - but nobody answered me.

Probably because either no "hard" data was available. I have however
some older mails still marked unread, i. e. postponed for later review
with more time at my hands.

-- 
Matthias Andree



More information about the bogofilter-dev mailing list