The Risk of Spam Complaints

David Relson relson at osagesoftware.com
Mon Oct 21 13:58:55 CEST 2002


At 04:18 AM 10/21/02, Boris 'pi' Piwinger wrote:

>Hi!
>
>I just got a false positive. It was a spam complaint I wrote, of
>course, including the original spam (quoted). I bcc'ed the address the
>spam was delivered to.
>
>Now clearly that mail of mine contained all the bad words. So I had to
>-N it. But then this makes the bad word better again. I don't have a
>solution to this, though.
>
>pi

Hi pi,

I've had two such occurrences - one from an email I sent that showed spam 
calculations (words and their spamicity) that was quoted when the reply 
came back and a second when someone sent me a tarball of their 
wordlists.  Knowing the context of the messages, as a human I'd call them 
non-spam.  Given that they contained lots of spammish words, bogofilter was 
justified in calling them spam.  If I was doing manual filtering, I would 
update neither word list.

One solution is a white list.  Mail from the bogofilter mailing lists is 
accepted, without updating any wordlists.  Perhaps I'll implement this as a 
procmail recipe...

The bigger question is:  "What's the harm in (mis)classifying a few 
messages about spam?".  I assert that there is no harm.  Currently my spam 
list has 107,000 words and 6,000 messages and my non-spam list has 285,000 
words and 29,000 messages.  Adding a few messages and their few hundred 
words to the wrong list is going to have very little effect.

David


For summay digest subscription: bogofilter-digest-subscribe at aotto.com



More information about the Bogofilter mailing list