Spam in images

Tom Anderson tanderso at oac-design.com
Tue Aug 1 22:13:05 CEST 2006


Tony L. Svanstrom wrote:
> On Mon, 31 Jul 2006 the voices made Bill Wohler write:
> BW> What's the current best practice with these? Classify as spam, or just
> BW> delete?
> 
>  If it's spam, then it's spam... My view is that if you've got a "learning"
> filter then just hand it all spam and ham, and let it sort it out; if it can't,
> then it's either broken by design or outdated.

I tend to agree with that view.  However, while I don't think that 
Bogofilter is "broken" or "outdated", I do like to give it a little more 
info in cases like this where it's just a big image and relatively 
neutral headers and perhaps a paragraph of random text.  Except perhaps 
for the religious or political variety, there's one feature that all 
spams share... a profit motive.  That means enticing you or tricking you 
into clicking on a link.  And that's a very, very powerful giveaway. 
Bogofilter already matches domain names as a part of filtering, but 
spammers notoriously move around from server to server, thus defeating 
your built-in greylist created in the course of training on errors. 
However, through the power of sheer numbers, URIBLs are able to list 
many of these URLs thanks to their addition via reports from early 
victims or honeypots.  In order to provide Bogofilter with this extra 
level of research on each email, I built a pre-filter called 
"stripsearch" which parses the email body and looks up all URLs to see 
if they are listed on URIBLs, and if so, it inserts the token 
SPAM-ADDRESS into the email, providing both a visual cue to the reader 
and also giving Bogofilter some extra info with which to make a 
spamicity decision.

You would add this step if using procmail:

   :0
   {
         :0 fbw
         | stripsearch

         :0 fw
         | bogofilter -uep
   }

When there are few other tokens, as is the case when it's just a big 
image, then this can send it over to the spam side.  Since I've started 
using it over a year ago, I receive no more of those image spams. 
They're all classified correctly as spam.

You should have stripsearch in your Bogofilter /contrib/ directory, or 
you can download it here: http://orderamidchaos.com/bogofilter/stripsearch

Tom




More information about the Bogofilter mailing list