Do we need an exclusion list or something?

Eric Seppanen eds at reric.net
Fri Sep 13 21:26:25 CEST 2002


On Fri, Sep 13, 2002 at 03:18:36PM -0400, Paul Tomblin wrote:
> I was looking at a message that had been miscategorized as spam, and I see
> that most of the words returned by "bogofilter -v" with high numbers are
> ones that are on every single email message I recieve, spam or not:
<snip>
> 
> "edt", "for", "esmtp", "with", "postfix", "allhats.xcski.com",
> "localhost", "from", "received", "allhats", "delivered-to", "return-path",
> "sep" and "xcski.com" are going to be in the headers of every single
> message I recieve, spam or not.  How can I stop it from classifying these
> messages as spam?  Is it because the account this one is on hasn't
> received enough non-spam to train bogofilter properly?

In my opinion this will always be a problem.  I spotted this when I fed it 
a bunch of spam messages from the month of May and then found that the 
word "may" was being treated as a very strong indicator of spamicity.

I have written an "ignore-list" patch, but it depends on my "multi-list" 
patch, and I haven't received much feedback on that yet.

If you want to test my "ignore-list" patch let me know.


For summay digest subscription: bogofilter-digest-subscribe at aotto.com



More information about the Bogofilter mailing list