ext3fs slowness -- how things proceed

Greg Louis glouis at dynamicro.on.ca
Wed Feb 5 13:01:05 CET 2003


On 20030205 (Wed) at 1220:47 +0100, Matthias Andree wrote:
> Matt Armstrong schrieb am Tuesday, den 04. February 2003:
> 
> > I do wonder if ordering the writes to the db would help.  This is the
> > idea I've brought up before.  If we write keys to the db in the same
> > order that the db sorts them, theoretically, no single page in the BDB
> > would be written to disk more than once -- though this depends on how
> > BDB is written.
> 
> Some pages will be rewritten, but few compared to what we rewrite now.
> 
> The question is if it's more efficient to lean towards BerkeleyDB
> caching even if it can't have the whole DB in the cache or if we try to
> tune our write order first.
> 
When working with a 20-million-byte (19 Mb or so) database, I found
that a cache of 17Mb was sufficient to support the minimum execution
time of 18 seconds (building from scratch using bogoutil with tokens in
random order).  16.5 Mb cache, 1 minute 5 seconds.  16.0 Mb cache, 3
minutes odd.  15 Mb cache, six and a half minutes.  10 Mb, six and a
half minutes (maybe a couple seconds longer than 15Mb).  256 Kb, the
default, took 26 minutes.  All this on ext3, data=ordered.

I suspect that without write ordering, the scatter is too great for
anything but a near-full-size cache.  I'm running my production
bogofilter at work with a 25Mb cache because the goodlist is over 30
million bytes there.

-- 
| G r e g  L o u i s          | gpg public key:      |
|   http://www.bgl.nu/~glouis |   finger greg at bgl.nu |
| Help free our mailboxes. Include                   |
|        http://wecanstopspam.org in your signature. |




More information about the bogofilter-dev mailing list