README.ext3

Matthias Andree matthias.andree at gmx.de
Sat Feb 1 17:32:42 CET 2003


On Fri, 31 Jan 2003, Greg Louis wrote:

> I thought there was a big problem with bogofilter-0.10 because of its
> database bloat, which on my system translated to terribly slow database
> access.  Turns out that's sort of true, but it's not nearly so bad if
> you don't use a journalling filesystem.  I did a moderately rigorous
> comparison, and as a result, here's a draft for yet another README.*

I've looked closer, and straced bogofilter to see what BerkeleyDB does.
I see two fsyncs, one for goodlist, one for spamlist, in either mode.

I have read my current linux-kernel folder's mbox file into bogofilter,
and plotted the positions of the writes, I have like 25,000 accesses
here of one page each, to 697 distinct pages in the file, and one
fsync().

I have two graphs, figure one shows the positions of the writes in
chronologic order, http://mandree.home.pages.de/bogofilter/writes.png --
the writes are pretty much unordered.

The other picture is less interesting; for fun, I plotted which pages
were accessed how frequently,
http://mandree.home.pages.de/bogofilter/frequency.png.

FreeBSD with BDB 3.3 shows similar patterns, with bigger page size
though (16k), so it only has 121 distinct pages. I didn't make graphs
for these.

While this is still empiric, I suspect that the kernel performs the
pwrite in FIFO order, but to prove that, one would have to hook into
ll_rw_block. A friend of mine did that on a different purpose (to
optimize power management on a laptop), but I'm not sure if I have the
module around.




More information about the Bogofilter mailing list