Pre-filtering

evan at coolrunningconcepts.com evan at coolrunningconcepts.com
Sun May 22 22:09:31 CEST 2005


I'm not a list member, but wanted to share some research I've done on a spam
detection idea - hopefully this message won't bounce, and any replies or
questions should be sent to me directly.

This method is quite effective, and I've found it has an incredibly low
probability of false positives, and while not directly related to bogofilter,
you may find it makes a good "pre-filter" for removing obvious spam before
sumbitting to bogofilter, or to help in bogofilters training.

First, this algorythm does use DNS blocklists, but the traditional use of
looking up the sender in a DNS blocklist is optional, and discouraged as it can
lead to false positives.  Instead, the message is URL-decoded and mime-decoded. 
The body of the message is then searched for anything that looks like an URL, be
it a clickable link or external image, or just text that says to go to a
particular website or email a particular email address.  The URL's host portion
is then resolved to an IP, and this IP is searched in your favorite DNS
blocklist.  I use xbl-sbl.spamhaus.org.

This method uses the information that the spammer is trying to convey to you,
the URL they want you to go to, against them.  This method is surprisingly
effective at catching spam, and when combined with a local DNS cache (like
djbdns's dnscache) its surprisingly fast as well.  I'll let you calculate your
own effectiveness and false-positive rates, but I think you'll like it.

-- Evan Langlois
evan at coolrunningconcepts.com



More information about the Bogofilter mailing list