Robinson vs Graham - a testing methodology
David Relson
relson at osagesoftware.com
Fri Oct 25 02:46:15 CEST 2002
Greetings,
As many of you know, I've been working with Greg Louis to merge his
implementation of the Robinson algorithm into bogofilter. Some weeks ago, I
took an early version of his code and merged it into my private, test
version of bogofilter. Back then, I ran some Graham vs. Robinson
comparison tests. For my test, I had a set of 32 spam and 10 good messages
that bogofilter hadn't previously encounterd. The Robinson algorithm
recognized about 50% more of the spam than did the Graham algorithm.
That was then. This is now.
Now we have the Robinson algorithm included in the current (CVS) source
tree for bogofilter. The big question to answer is whether we should
convert bogofilter from the Graham algorithm to the Robinson algorithm? or
not? And why should we do so? or not?
To answer these question, information needs to be gathered, performance
measured, and statistics gathered.
To further this work, I have an additional 17 days of email that have come
in since I started using bogofilter in production. This gives me an
additional several thousand messages, including some hundreds of spam
messages. I've been thinking about how to do some meaningful testing using
this data. Here are my thoughts, ideas, and plans...
First, I'll use bogofilter word lists that predate these new
messages. This prevents bogofilter from seeing a message it has already
been told is spam or is not spam.
Second, the testing will have two major components. One is counting and
the other is learning.
*** Counting ***
Each message is classified twice by bogofilter - once for each
algorithm. The messages is categorized as:
G - only Graham indicates spam
R - only Robinson indicates spam
GR - both algorithms indicate spam
NN - both algorithms indicate non-spam
and a count is made of messages vs. group. This indicates how well the
algorithms do with the starting word lists.
*** Learning ***
Each message is again classified twice by bogofilter. The message again
goes into the G, R, GR, or NN bin for counting. However, immediately after
classification, each message classified as spam (by either or both
algorithms) is fed into the spam list and messages classified NN go into
the non-spam list. All messages are processed in this manner - classify
twice, then update word list. Again, the final counts of G, R, ... are
tallied (and saved). Any changes in the tallies reflect what bogofilter
has learned while processing this learning phase.
The counting and learning phases are repeated several times, using the
updated wordlists each time. Message counts are saved. The goal of this
repetition is to measure the learning effect and and see quickly the counts
converge on a final result.
*** Results ***
I don't have them yet. I'm still putting together a test script and
testing it. I'll report when I have results.
David
--------------------------------------------------------
David Relson Osage Software Systems, Inc.
relson at osagesoftware.com Ann Arbor, MI 48103
www.osagesoftware.com tel: 734.821.8800
More information about the Bogofilter
mailing list