classification error ???

Eric Seppanen eds at reric.net
Fri Sep 13 19:21:20 CEST 2002


On Fri, Sep 13, 2002 at 12:56:20PM -0400, David Relson wrote:
> About a year ago, my son received a bunch of spam titled "hello babe".  I 
> eventually forwarded one to abuse at yahoo.com.  Whatever the story, in one of 
> my tests I ran both the original message and the forwarded  message (which 
> includes a full copy of the original) through bogofilter.
> 
> The original was classified as spam and the copy as nonspam.  I ran 
> bogofilter in verbose mode to see the words used for calculating the 
> spamicity.  The original had a reasonable list of words ( with both high 
> and low probabilities in it), but the copy had a totally different word set 
> and all its words had probability 0.010000.
> 
> What the heck is going on?

It's probably picking up on header words from the forwarded message, and 
recognizing them as strong indicators of "hamness" (opposite of 
"spamness").  I noticed the same thing when I tried to forward a few spam 
messages from another account for testing.

Bogofilter is working as designed in this case, and it's not clear to me 
whether a workaround is feasable.  I would imagine adding a test mode 
where the headers are ignored would help.  It would let you test how 
bogofilter likes the message that's included.  It's not clear to me that 
this is needed for any "production" reasons.

Should probably be a FAQ entry, though.

For summay digest subscription: bogofilter-digest-subscribe at aotto.com



More information about the Bogofilter mailing list