Spam in images

Tom Anderson tanderso at oac-design.com
Wed Sep 6 18:25:58 CEST 2006


Eric Wood wrote:
> I go around to many companies to work on their systems and I have yet to 
> meet a person that turned images off because they were concerned about 
> privacy.  Average users don't play on/off games with their email client, 
> they just want to see the email whereever it comes from and go home at 5:00 
> o'clock.

Thunderbird, Gmail, Outlook 2003, Hotmail, AOL, Yahoo!, and many other 
email readers have images turned off by default.  Probably 50% of people 
or more turn off images (either manually or by default), as it has 
become a major problem for newsletter and advertising publishers using 
track-back bugs.

> Seems like a bunch of people disagree with your recomendation:
> Tigerdirect, Surplus Computers, BellSouth, Ebay, Office Depot/Max, Techmags, 
> Paypal and many other html emails I just scrolled through in my inbox all 
> reference (pull) images from their corporate webservers.   There may be one 
> oddball company sending me some inline images in there somewhere.....  And 
> like I said, it's usually people inlining their signature pictures or 
> corporate logos that could be a future problem if their email client does 
> "cids:" with a @. notion.

Indeed, I receive some of those emails, and I usually delete them rather 
than load the images to view their specials.  Only when I'm strictly in 
the market for something do I bother to load images to see what's on 
sale.  There goes the impulse buys.

I don't have any emails on hand which use inline images since I don't 
archive, but I know various businesses from whom I've ordered stuff in 
the past have sent me flyers with images inline.

And it's getting easier and easier to do so, as nearly all bulk emailing 
software now includes the functionality.  Here are a few examples:

http://mojo.skazat.com/features/2_10/
http://www.designerfreesolutions.com/web/viewitem.asp?idproduct=1027
http://www.newsletteradministrator.com/features.html
http://homepage.mac.com/julifos/soft/newsletter/index.html

>>Nor do they warrant any special treatment.  Just train on errors.
> 
> Ahem.... but you are doing special treatment!  You had to run it through a 
> perl fork (stripsearch).
> 
> I'm just saying my "special treatment" is to look at cid: syntax and yours 
> it to grab an ASN and train on that.  I'm just not that advanced yet.

I wouldn't call that special treatment, as I send all email through 
stripsearch, which adds AS numbers to all emails.  This simply adds 
additional information about the sender which bogofilter may or may not 
use to classify it.  It doesn't judge the email because it used an 
RFC-appropriate (http://www.rfc-editor.org/rfc/rfc2111.txt) but 
selectively taboo formatting.  Granted, a legitimate email author may 
move into a previously spammy ASN region and slightly affect their 
spamicities (although it will very unlikely affect the bogosity of a ham 
message) so perhaps my method isn't perfect and nor did I claim that it 
was an end-all, be-all solution.  But it is certain that based on 
current trends (email clients turning off images by default, newsletter 
software including inline capabilities), excluding emails with inline 
images will definitely lead to false positives soon enough.  Could it be 
a useful prefilter?  Sure, seems to work well enough for you.  But given 
demographic and technological momentum toward mainstream use of inline 
images, I wouldn't risk using it to filter emails myself.

> PS.  My procmail rule caught 19 images spam while writing this message. 

Only one of my 10 spams in the past 10 minutes (which got past my 
DNSBLs, that is) had inline images (spamicity=0.521389, robx=0.41, 
spam_cutoff=0.42).  The added ASN was mildly helpful:

"rcvd:as4713"                       375  0.000057  0.000466  0.890230 +

The email's "random" text was more damning though, with terms like 
"cried", "wont", "quarrelsome", etc., apparently not showing up in my 
hams very often at all.  But perhaps even more surprisingly, bogofilter 
is apparently already filtering inline images for me, with "mime:gif", 
"head:type", "mime:image", "mime:Content-ID", and "head:related" all 
very spammy.  So there's not much reason to add additional filtering in 
this regard anyway!

Tom




More information about the Bogofilter mailing list