Performance test framework

Adrian Otto aotto at aotto.com
Thu Sep 19 09:41:12 CEST 2002


Mark,

First of all I want to say thank you for your contribution. This is a very
clean script, and it's obvious that you are modest about your Perl skills.

> As briefly as possible, this script requires a directory
> full of messages (msg.*).  For every message M, it trains
> bogofilter on all the other messages !M.  It then  reports
> bogofilter's prediction for M.  It repeats this process
> for every message in the test directory.  Some final
> statistics are printed before it quits.

That's a creative way to do it. When this test is run against a timer on a
known data set, we will quickly be able to detect performance differences
with the message content registration functions.

It might also be valuable to have a few more modes for the test to run in.
For example, a "training only" mode, and a "classification only" mode. In
the "training only mode" the user can create three directories, one named
'spam' full of spam messages, a second named 'good' filled with good
messages, and a third with a specified name filled with a test set of
messages and a spam list file like you described above. The wordlist
database files are removed from the third directory if they exist, and
bogofilter is then trained using the files in the first and second
directories of spam and good messages respectively, and new wordlist
database files are stored in the third directory. Then the user starts the
script in "classification only" mode specifying the directory with the newly
created wordlist database files in it. The script would evaluate all those
messages using the wordlists already created.

Using these modes will help us benchmark the message evaluation code, which
is especially interesting for benchmarking purposes. It also lets us
benchmark the message registration code separately from the message
evaluation code.

Finally, a simple /bin/sh script that runs each of the three test modes
through /usr/bin/time so that standard stats are really easy to generate.
Are you interested in adding the additional modes, or would you like me to
help out a little bit?

Thanks,

Adrian



More information about the bogofilter-dev mailing list