regression tests
David Relson
relson at osagesoftware.com
Wed Jan 8 14:20:22 CET 2003
Gyepi,
Using the new and old mime parsers, I've run bogolexer on all the messages
in spam.mbx and good.mbx of the regression tests, diffed the outputs, and
identified the cause of the differences. There are 21 spam and 48 ham
messages in the two mailboxes. For most of them, the two lexers showed no
diference.
Here's a summary of the differences:
7 spam messages - new code has additional tokens for content-length,
content-disposition, or content-transfer-encodings
5 spam messages - old code had tokens from inside html tags.
1 good message - new code has additional tokens for content-description
2 good messages - new code has tokens for unusual Content-Types related to
delivery error
1 good message - new code has additional tokens for multipart/mixed
This is excellent! In _all_ cases the new code is right and the old code
was wrong.
I have updated the regression test outputs so all tests should pass after
your next update.
The new code has my blessing :-)
David
P.S. Attached is file new.old.txt which gives details of the test
outputs. It shows the diff output files, their sizes, and what the
differences are.
-------------- next part --------------
252 msg.02.s.txt.dif - new - content-length content-transfer-encoding bit
266 msg.05.s.txt.dif - new - content-disposition inline content-transfer-encoding bit
266 msg.10.s.txt.dif - new - content-disposition inline content-transfer-encoding bit
266 msg.12.s.txt.dif - new - content-disposition inline content-transfer-encoding bit
266 msg.14.s.txt.dif - new - content-disposition inline content-transfer-encoding bit
266 msg.19.s.txt.dif - new - content-disposition inline content-transfer-encoding bit
746 msg.04.s.txt.dif - new - content-disposition inline content-transfer-encoding bit
- old - text/html, 7bit - had tokens from inside tags
1894 msg.06.s.txt.dif - old - text/html, quoted-printable - had tokens from inside tags
604 msg.08.s.txt.dif - old - text/html, 7bit - had tokens from inside tags
1754 msg.15.s.txt.dif - old - text/html, 7bit - had tokens from inside tags
3327 msg.18.s.txt.dif - old - text/html, quoted-printable - had tokens from inside tags
651 msg.05.g.txt.dif - new - text/html, 7bit (not mime-multipart)
1041 msg.27.g.txt.dif - new - content-description notification
- new - Delivery error report, message/delivery-status
- new - Undelivered Message Headers, text/rfc822-headers
2833 msg.29.g.txt.dif - new - multipart/mixed
-------------- next part --------------
--------------------------------------------------------
David Relson Osage Software Systems, Inc.
relson at osagesoftware.com Ann Arbor, MI 48103
www.osagesoftware.com tel: 734.821.8800
More information about the bogofilter-dev
mailing list