Performance test framework

Mark M. Hoffman mhoffman at lightlink.com
Thu Sep 19 18:15:19 CEST 2002


David:

* David Relson <relson at osagesoftware.com> [2002-09-19 09:19:45 -0400]:
> Mark,
> 
> A nice start at a test framework!  I've given it a quick test and like 
> it.  Naturally I have a couple of comments :-)
> 
> When testing, I like to generate output for each message, so I changed
> 
> 	$prediction = system("$bogofilter < $msg") ;
> to:
> 	$prediction = system("$bogofilter < $msg >$msg.out") ;
> 
> This worked fine - once.  The second pass, it also processed msg.*.out and 
> generated msg.*.out.out files.  My fix was to name test messages as *.msg, 
> rather than msg.*.  This allows me to generate *.out files.

I'm quite attached to "msg.*" because that's what procmail generates when
you point its output at a directory.  Would you mind if it just ignores
msg.*.out?

> A second idea, is based on all the n^2 nature of the work.  This happens 
> because new word list databases are created for testing each message.  Why 
> not separate the word list creation from the testing, i.e.
> 
> 	pass 1:  for each xyz.msg, generate hamlist.db and spamlist.db and save 
> them as xyz.ham.db and *xyz.spam.db
> 	pass 2:  for each xyz.msg, copy the *.xyz.db to xyz.db then test xyz.msg
> 
> The word list generation (pass 1) would only need to be done when 
> additional messages are added to the test suite.  The testing (pass 2) 
> could be done as often as desired.
> 
> Yes, this would use disk space, but it would save lots of time while 
> running tests.  A modification of the idea would be to save the two 
> databases in a directory specific to the message, i.e. for xyz.msg save 
> them as xyz.db/hamlist.db and xyz.db/spamlist.db.  This would allow 
> bogofilter to directly access the word lists by using the "-d xyz.db" flag.
<cut>

That's ok if you assume no change in how you're counting words (while
you're presumably tweaking other parts of bogofilter).  I'll put in
separate "train" and "evaluate" modes as you and Adrian suggest.

Regards,

-- 
Mark M. Hoffman
mhoffman at lightlink.com



More information about the bogofilter-dev mailing list