README.ext3

Greg Louis glouis at dynamicro.on.ca
Sun Feb 2 01:02:59 CET 2003


On 20030201 (Sat) at 1612:36 +0100, Matthias Andree wrote:

> We'd better get the performance issues fixed, or if there's a bug, we'd
> better get that reported. ext2 is way inferior to ext3 in terms of
> consistency, recovery or robustness.

Recovery I will buy; can you supply pointers to evidence for the other
two?

> Given that the performance issues
> cannot be reproduced, claiming ext3 to be slow generally is IMO
> premature. My mbox has been smaller than yours and haven't turned up
> with nearly as much tokens

Of course the performance issues cannot be reproduced if you don't
reproduce the conditions.  Sheesh...

>, so it might really be a tuning issue or an
> issue with the kernel version that you're using.

Yes, that is true, it might.  We need to know.  What kernel version are
_you_ using? -- I have one machine I could borrow to check that.

> Plus, priming the data
> base with some training data is an operation that isn't performed very
> often, so we can live with that.

The email to which you were replying mentioned that with ext3 a
_classification_ takes me four times as long as if the db files are on
ext2.  Maybe _you_ can live with that...

> I'm very chary about recommending people to turn consistency guarantees
> off, I have learnt BDB isn't very robust against corruption, and if
> something goes wrong, user should at least notice.

Yes, I agree with you here.  Some sort of external recovery strategy is
definitely needed if the db files are on ext2 or ext3/writeback.  As
in, "he who laughs last probably made a backup."  Probably the warning
I gave is worded too gently.

> > 3.  With ext3 in the data=journal mode (all data are committed to the
> > journal prior to being written into the main file system)
> > 
> > # umount /xtrn
> > # mount -t ext3 -o data=journal /dev/scd1 /xtrn
> > # rm -f /xtrn/db/*
> > # time /lighter/usr/bin/bogo10 -d /xtrn/db -v -s <spam_corpus 
> > # 5868782 words, 14502 messages
> > 
> > real    14m11.143s	user    2m34.430s	sys     0m45.170s
> 
> This is really some interesting data point, essentially, this means that
> BDB might do many more synchronous operations than we are aware of given
> this only takes half the time of data=writeback.

Of data=ordered.  Writeback is only slightly slower than ext2.  Just a
typo, no doubt.  I too was interested that journal is almost twice as
fast as ordered, but I don't know enough about db internals to turn
that to advantage.

-- 
| G r e g  L o u i s          | gpg public key:      |
|   http://www.bgl.nu/~glouis |   finger greg at bgl.nu |
| Help free our mailboxes. Include                   |
|        http://wecanstopspam.org in your signature. |




More information about the Bogofilter mailing list