README.ext3
Matthias Andree
matthias.andree at gmx.de
Sat Feb 1 16:12:36 CET 2003
On Fri, 31 Jan 2003, Greg Louis wrote:
> I thought there was a big problem with bogofilter-0.10 because of its
> database bloat, which on my system translated to terribly slow database
> access. Turns out that's sort of true, but it's not nearly so bad if
> you don't use a journalling filesystem. I did a moderately rigorous
> comparison, and as a result, here's a draft for yet another README.*
> file that might be useful:
We'd better get the performance issues fixed, or if there's a bug, we'd
better get that reported. ext2 is way inferior to ext3 in terms of
consistency, recovery or robustness. Given that the performance issues
cannot be reproduced, claiming ext3 to be slow generally is IMO
premature. My mbox has been smaller than yours and haven't turned up
with nearly as much tokens, so it might really be a tuning issue or an
issue with the kernel version that you're using. Plus, priming the data
base with some training data is an operation that isn't performed very
often, so we can live with that.
I'm very chary about recommending people to turn consistency guarantees
off, I have learnt BDB isn't very robust against corruption, and if
something goes wrong, user should at least notice.
> 3. With ext3 in the data=journal mode (all data are committed to the
> journal prior to being written into the main file system)
>
> # umount /xtrn
> # mount -t ext3 -o data=journal /dev/scd1 /xtrn
> # rm -f /xtrn/db/*
> # time /lighter/usr/bin/bogo10 -d /xtrn/db -v -s <spam_corpus
> # 5868782 words, 14502 messages
>
> real 14m11.143s user 2m34.430s sys 0m45.170s
This is really some interesting data point, essentially, this means that
BDB might do many more synchronous operations than we are aware of given
this only takes half the time of data=writeback.
More information about the Bogofilter
mailing list