classification error ???

Mark M. Hoffman mhoffman at lightlink.com
Fri Sep 13 19:36:25 CEST 2002


Hello:

* David Relson <relson at osagesoftware.com> [2002-09-13 12:56:20 -0400]:
<snip>
> 
> The original was classified as spam and the copy as nonspam.  I ran 
> bogofilter in verbose mode to see the words used for calculating the 
> spamicity.  The original had a reasonable list of words ( with both high 
> and low probabilities in it), but the copy had a totally different word set 
> and all its words had probability 0.010000.
> 

In a message containing more than 15 words of maximum deviation (.49),
bogofilter will choose the first 15 of them.  In your copy, the headers would
be completely different and among the first that bogofilter sees.  I consider
it a feature of bogofilter that a header token of max deviation is chosen
above any later token of any score in the body.  The downside is that the 
spammy disclaimers at the bottom of a long spam could be ignored.

But besides all that, does it even make sense for an email that you sent to
be marked as spam?  I mean, it *wasn't* right? ;)

Regards,

-- 
Mark M. Hoffman
mhoffman at lightlink.com


For summay digest subscription: bogofilter-digest-subscribe at aotto.com



More information about the Bogofilter mailing list