README.ext3

Matthias Andree matthias.andree at gmx.de
Sat Feb 1 16:12:36 CET 2003


On Fri, 31 Jan 2003, Greg Louis wrote:

> I thought there was a big problem with bogofilter-0.10 because of its
> database bloat, which on my system translated to terribly slow database
> access.  Turns out that's sort of true, but it's not nearly so bad if
> you don't use a journalling filesystem.  I did a moderately rigorous
> comparison, and as a result, here's a draft for yet another README.*
> file that might be useful:

We'd better get the performance issues fixed, or if there's a bug, we'd
better get that reported. ext2 is way inferior to ext3 in terms of
consistency, recovery or robustness. Given that the performance issues
cannot be reproduced, claiming ext3 to be slow generally is IMO
premature. My mbox has been smaller than yours and haven't turned up
with nearly as much tokens, so it might really be a tuning issue or an
issue with the kernel version that you're using. Plus, priming the data
base with some training data is an operation that isn't performed very
often, so we can live with that.

I'm very chary about recommending people to turn consistency guarantees
off, I have learnt BDB isn't very robust against corruption, and if
something goes wrong, user should at least notice.

> 3.  With ext3 in the data=journal mode (all data are committed to the
> journal prior to being written into the main file system)
> 
> # umount /xtrn
> # mount -t ext3 -o data=journal /dev/scd1 /xtrn
> # rm -f /xtrn/db/*
> # time /lighter/usr/bin/bogo10 -d /xtrn/db -v -s <spam_corpus 
> # 5868782 words, 14502 messages
> 
> real    14m11.143s	user    2m34.430s	sys     0m45.170s

This is really some interesting data point, essentially, this means that
BDB might do many more synchronous operations than we are aware of given
this only takes half the time of data=writeback.




More information about the Bogofilter mailing list