regression tests

David Relson relson at osagesoftware.com
Wed Jan 8 14:20:22 CET 2003


Gyepi,

Using the new and old mime parsers, I've run bogolexer on all the messages 
in spam.mbx and good.mbx of the regression tests, diffed the outputs, and 
identified the cause of the differences.  There are 21 spam and 48 ham 
messages in the two mailboxes.  For most of them, the two lexers showed no 
diference.

Here's a summary of the differences:

7 spam messages - new code has additional tokens for content-length, 
content-disposition, or content-transfer-encodings
5 spam messages - old code had tokens from inside html tags.

1 good message  - new code has additional tokens for content-description
2 good messages - new code has tokens for unusual Content-Types related to 
delivery error
1 good message  - new code has additional tokens for multipart/mixed

This is excellent!  In _all_ cases the new code is right and the old code 
was wrong.

I have updated the regression test outputs so all tests should pass after 
your next update.

The new code has my blessing :-)

David

P.S. Attached is file new.old.txt which gives details of the test 
outputs.  It shows the diff output files, their sizes, and what the 
differences are.
-------------- next part --------------
  252  msg.02.s.txt.dif - new - content-length content-transfer-encoding bit
  266  msg.05.s.txt.dif - new - content-disposition inline content-transfer-encoding bit
  266  msg.10.s.txt.dif - new - content-disposition inline content-transfer-encoding bit
  266  msg.12.s.txt.dif - new - content-disposition inline content-transfer-encoding bit
  266  msg.14.s.txt.dif - new - content-disposition inline content-transfer-encoding bit
  266  msg.19.s.txt.dif - new - content-disposition inline content-transfer-encoding bit

  746  msg.04.s.txt.dif - new - content-disposition inline content-transfer-encoding bit
			- old - text/html, 7bit - had tokens from inside tags
 1894  msg.06.s.txt.dif - old - text/html, quoted-printable - had tokens from inside tags
  604  msg.08.s.txt.dif - old - text/html, 7bit - had tokens from inside tags
 1754  msg.15.s.txt.dif - old - text/html, 7bit - had tokens from inside tags
 3327  msg.18.s.txt.dif - old - text/html, quoted-printable - had tokens from inside tags

  651  msg.05.g.txt.dif - new - text/html, 7bit (not mime-multipart)
 1041  msg.27.g.txt.dif - new - content-description notification
			- new - Delivery error report, message/delivery-status
			- new - Undelivered Message Headers, text/rfc822-headers
 2833  msg.29.g.txt.dif - new - multipart/mixed

-------------- next part --------------
--------------------------------------------------------
David Relson                   Osage Software Systems, Inc.
relson at osagesoftware.com       Ann Arbor, MI 48103
www.osagesoftware.com          tel:  734.821.8800



More information about the bogofilter-dev mailing list