[cvs] bogofilter/src mime.c,1.33,1.34
Matthias Andree
matthias.andree at gmx.de
Tue Jan 4 13:12:54 CET 2005
Evgeny Kotsuba <evgen at shatura.laser.ru> writes:
>>/Usually/ the limiting factor is I/O speed, and that you won't change
>>with any optimization. CPUs have become pretty fast, sequential
>>throughput of drives has improved, but random access is the
>>bottleneck. Server disk drives rotate faster, with shorter strokes,
>>making more noise, to improve the number of synchronous operations, and
>>have more sophisticated queueing (SCSI tagged command queueing).
>>
> I use JFS and JFS cache is 200Mb, the data base is no more than 50Mb,
> so there are no limiting disk operation. Medium message size is
> about 10-20kb, so 200kb/sek input also is not limiting.
The average access time of a drive, which is composed of the actual seek
time (move heads to right position, design decision and the longer the
more scattered the accesses are, for instance, 9 ms for a half stroke)
and the rotational latency (half the rotational period, which itself is
the reciprocal of the rotational frequency, for instance, 4.2 ms for a
7200/min drive) is the limiting factor on reads and on synchronous
writes. The OS and drive _MUST_ wait for the proper block for
synchronous writes, although caches can help somewhat with the read
performance.
I don't know what the JFS cache semantics on AIX or OS/2 are, on Linux,
the cache is automatically and dynamically allocated by the kernel
itself. The journal size is irrelevant for JFS, as it journals only meta
data, not file data.
>>Even on older machines, strace with timestamps enabled (-tt or -ttt) may
>>give hints. If it spends a lot of time in open, read, write, fsync,
>>close, ... you know optimizing the code will help nothing. In that case,
>>only changing algorithms to reduce the number of synchronous I/O
>>operations can help then.
>
> Well, I have found one thing - that on binary attachments all input
> strings are pass through all lexer and decoding
> Look at lexer.c -> yyinput()
Will investigate. I presume there are some bugs in mime.c left.
> I have already ask about bogofilter's speed in real environment, i.e.
> on per message basis - but nobody answered me.
Probably because either no "hard" data was available. I have however
some older mails still marked unread, i. e. postponed for later review
with more time at my hands.
--
Matthias Andree
More information about the bogofilter-dev
mailing list