Spam in images

Mon Aug 7 18:22:58 CEST 2006

Tony L. Svanstrom wrote:
> On Mon, 7 Aug 2006 the voices made David Relson write:
> 
> DR> I've been seeing 2 distinct varieties of image spam recently.
> DR>
> DR> #1 contains image001.gif, a slew on innocuous words, and is 28Kb
> DR> to 29Kb long.
> DR>
> DR> #2 contains p.jpg and a bit of html, and is 26Kb long.
> DR>
> DR> #3 contains multiple images (each containing a line or so of the message)
> DR> and is approx 80Kb long.
> 
>  The ones that I'm getting are all about 31-33K, and contains multiple lines
> with garbage text (even the from-names and subjects seem to be random), and a
> single image.
> 
>  Training helps somewhat, but my main concern is that this is a long-term
> (months, up to a year) project with the goal to poison peoples ham/spam-
> databases.
>  Whatever it is it'll probably change the way that many view/use
> statistical(ish) spamfilters today.
> 

Very interesting... the very next email in my mbox after yours was just 
such an email.  Bunch of random text, an image, and no urls!  Do 
spammers really expect people to type in a url they see in an image?  I 
can understand the impulse to click a link at the spur of the moment, 
but typing it into the browser just provides too much time to 
second-guess the wisdom of doing so.

In any event, I've never noticed these before because they were always 
classified as spam.  This one had a 0.82 spamicity the first time 
through, and now when I test it again, it shows 0.97, probably because 
this one and several others have gone through bogofilter with 
autoupdating of the database.  I'm very thankful for the -u feature.

A few of the "random" words came up hammy, but most were neutral and an 
equal number came up spammy -- the result of training such emails in the 
past which use "random" words not likely to show up in my 
correspondance.  This filter-evasion technique obviously fails for 
well-trained databases.  Other spammy markers (low pgood, high pbad) 
include "mime:image", "mime:Content-ID", "mime:gif", "head:X-UIDL", 
"rcvd:as11456", and "rcvd:as721".

Those last two tokens are what really pushed it over the edge.  I have 
19 instances of "rcvd:as11456" and 70 instances of "rcvd:as721", both 
with zero pgood.  These are ASNs which were inserted into my emails via 
spamitarium, my head-processing pre-filter.  ASNs are autonomous system 
numbers: http://en.wikipedia.org/wiki/Autonomous_system_(Internet). 
They are unique to regional registries, which creates kind of a regional 
greylist when I receive more spam than ham from some particular region 
or network operator.   A spammer can be assigned dynamic IPs through 
DHCP from their ISP, but they'll all have the same ASN.  AS11456 is 
assigned to NuVox Communications in Greenville, SC, who controls 622,298 
IPs.  AS721 is assigned to DoD Network Information Center in Columbus, 
OH, who controls 89,608,854 IPs.  Clearly I don't receive any hams from 
those networks, but I've received a number of spams, so all future 
emails will be slightly suspicious in the eyes of Bogofilter.

If you would like to run spamitarium on your incoming mail, you should 
be able to find it in your Bogofilter /contrib/ directory, or you can 
download it here: http://orderamidchaos.com/bogofilter/spamitarium. 
Other than that, just keep training on errors.

Happy filtering,

Tom