classification error ???
eds at reric.net
Fri Sep 13 13:21:20 EDT 2002
On Fri, Sep 13, 2002 at 12:56:20PM -0400, David Relson wrote:
> About a year ago, my son received a bunch of spam titled "hello babe". I
> eventually forwarded one to abuse at yahoo.com. Whatever the story, in one of
> my tests I ran both the original message and the forwarded message (which
> includes a full copy of the original) through bogofilter.
> The original was classified as spam and the copy as nonspam. I ran
> bogofilter in verbose mode to see the words used for calculating the
> spamicity. The original had a reasonable list of words ( with both high
> and low probabilities in it), but the copy had a totally different word set
> and all its words had probability 0.010000.
> What the heck is going on?
It's probably picking up on header words from the forwarded message, and
recognizing them as strong indicators of "hamness" (opposite of
"spamness"). I noticed the same thing when I tried to forward a few spam
messages from another account for testing.
Bogofilter is working as designed in this case, and it's not clear to me
whether a workaround is feasable. I would imagine adding a test mode
where the headers are ignored would help. It would let you test how
bogofilter likes the message that's included. It's not clear to me that
this is needed for any "production" reasons.
Should probably be a FAQ entry, though.
For summay digest subscription: bogofilter-digest-subscribe at aotto.com
More information about the Bogofilter