Bogofilter simulator.

Matthias Andree matthias.andree at gmx.de
Tue Feb 4 00:42:04 CET 2003


Greg, other interested parties,

I have a program that simulates the access pattern our bogofilter
problem performs; while it is not exactly the same distribution (it's
slightly more homogenous in respect to frequencies), and I haven't
compared \chi^2 or something, but I find it good enough:

ext3 + SCSI    -> fast, 24 s (aic7880 + fujitsu mah3182mp)
reiserfs + IDE -> fast, 23 s (via686  + ibm djna-352030)
ext3 + ide     -> slow, >8 min (via 686 + ibm dtla-307045)

Please try if you can reproduce your bad bogofilter times with this
program and let me know -- I want to be sure this works before we pass
this to Stephen and Andrew on the ext3 users list. To use:

1. gcc -W -Wall -O -o simbf simbf.c
2. time ./simbf >/path/to/file # the first 20,000 kByte of this file ARE LOST
3. rm /path/to/file

Make sure you don't redirect stdout of the simbf program to important
data (preferably, the file doesn't exist prior to the test).

I didn't bother for any user convenience, you'll have to edit the source and
recompile if you want to play with the file size (that is PAGESIZE *
PAGES) or the number of pwrites (that is WRITES).

BTW, is there decent literature on figuring out how "similar" two random
distributions are? Does someone have a reference handy?

-- 
Matthias Andree




More information about the bogofilter-dev mailing list