Spam in images
Tom Anderson
tanderso at oac-design.com
Mon Aug 7 18:22:58 CEST 2006
Tony L. Svanstrom wrote:
> On Mon, 7 Aug 2006 the voices made David Relson write:
>
> DR> I've been seeing 2 distinct varieties of image spam recently.
> DR>
> DR> #1 contains image001.gif, a slew on innocuous words, and is 28Kb
> DR> to 29Kb long.
> DR>
> DR> #2 contains p.jpg and a bit of html, and is 26Kb long.
> DR>
> DR> #3 contains multiple images (each containing a line or so of the message)
> DR> and is approx 80Kb long.
>
> The ones that I'm getting are all about 31-33K, and contains multiple lines
> with garbage text (even the from-names and subjects seem to be random), and a
> single image.
>
> Training helps somewhat, but my main concern is that this is a long-term
> (months, up to a year) project with the goal to poison peoples ham/spam-
> databases.
> Whatever it is it'll probably change the way that many view/use
> statistical(ish) spamfilters today.
>
Very interesting... the very next email in my mbox after yours was just
such an email. Bunch of random text, an image, and no urls! Do
spammers really expect people to type in a url they see in an image? I
can understand the impulse to click a link at the spur of the moment,
but typing it into the browser just provides too much time to
second-guess the wisdom of doing so.
In any event, I've never noticed these before because they were always
classified as spam. This one had a 0.82 spamicity the first time
through, and now when I test it again, it shows 0.97, probably because
this one and several others have gone through bogofilter with
autoupdating of the database. I'm very thankful for the -u feature.
A few of the "random" words came up hammy, but most were neutral and an
equal number came up spammy -- the result of training such emails in the
past which use "random" words not likely to show up in my
correspondance. This filter-evasion technique obviously fails for
well-trained databases. Other spammy markers (low pgood, high pbad)
include "mime:image", "mime:Content-ID", "mime:gif", "head:X-UIDL",
"rcvd:as11456", and "rcvd:as721".
Those last two tokens are what really pushed it over the edge. I have
19 instances of "rcvd:as11456" and 70 instances of "rcvd:as721", both
with zero pgood. These are ASNs which were inserted into my emails via
spamitarium, my head-processing pre-filter. ASNs are autonomous system
numbers: http://en.wikipedia.org/wiki/Autonomous_system_(Internet).
They are unique to regional registries, which creates kind of a regional
greylist when I receive more spam than ham from some particular region
or network operator. A spammer can be assigned dynamic IPs through
DHCP from their ISP, but they'll all have the same ASN. AS11456 is
assigned to NuVox Communications in Greenville, SC, who controls 622,298
IPs. AS721 is assigned to DoD Network Information Center in Columbus,
OH, who controls 89,608,854 IPs. Clearly I don't receive any hams from
those networks, but I've received a number of spams, so all future
emails will be slightly suspicious in the eyes of Bogofilter.
If you would like to run spamitarium on your incoming mail, you should
be able to find it in your Bogofilter /contrib/ directory, or you can
download it here: http://orderamidchaos.com/bogofilter/spamitarium.
Other than that, just keep training on errors.
Happy filtering,
Tom
More information about the Bogofilter
mailing list