[bogofilter] spamitarium test results

Mon May 10 16:29:32 CEST 2004

From: "Tom Allison" <tallison at tacocat.net>
> For each group of emails, the results were consistent and the values
> themselves were reasonably grouped.  There was a distinct difference
> between how a final score was achieved with spamitarium showing the most
> accurate positive results when using block_on_subnets=yes and robs=1.0
> for all spamitarium arguements.  In fact, the differences between the
> raw scores of the different settings were negligable.  My guess is the
> DNS information added is more important than the Header information that
> is stripped.  I did not run any tests to simply strip the headers
> without DNS/ASN information being added.

There must have been something that went wrong.  All of the spamitarium
results were exactly the same!  It's simply not probable for all
combinations of stripping nonstandard headers or not and leaving out the
helo field or not to have had exactly the same number of false positives,
unsures, and false negatives.  There would definitely be some variation from
one test to the next.  At the very least some unsures would jump from one
side to the other.  Please check to make sure that whatever command sequence
was used actually passed the parameters correctly.

I just tested a single random email and got the following result:
spamitarium -readw:   spamicity=0.499499
spamitarium -radw:     spamicity=0.499477
none:                          spamicity=0.487358
spamitarium -rdw:       spamicity=0.484318
spamitarium -sreadw:  spamicity=0.170268
spamitarium -sw:         spamicity=0.027598
spamitarium -srdw:      spamicity=0.022606

Clearly, the different options have a large effect.  Some make it seem more
hammy, others more spammy.  But the effect is significant.  You must have
some emails in your corpus that would also do a 0.3 to 0.5 jump.  Therefore,
I must question the testing process.

> I guess it's a matter of how you would like to judge the accuracy of the
> process, most positives or least false positives.

Your test slightly favors false positives by having robx > 0.5.  My
preference is to zero the false positives, and then work on reducing false
negatives and unsures.  Therefore, my robx < 0.5, to give new emails the
benefit of the doubt.  Perhaps my use of training on error with repetition
also helps me achieve this, but I don't seem to get any false positives.

Another thing I wish you'd include in the results is the wordlist sizes.  An
important role of spamitarium is reducing the "cruft" in email headers,
which should naturally lead to a smaller wordlist size.  If accuracy
differences are negligible, then wordlist size is an important deciding
factor of the effectiveness of the process.  In my opinion, the order of
importance is as follows: false positives, ham unsures, wordlist size, false
negatives, spam unsures.  Basically, the first three of those are the
measures of the negative effects of running a filter, which must be reduced
as near to zero as possible.  The last two are simply failings of the
positive effects, and would be present if no filter was running at all, thus
they are tolerable to an extent.

Tom