SPAN style="DISPLAY: none" spams

Tom Anderson tanderso at oac-design.com
Wed Jul 27 22:27:07 CEST 2005


----- Original Message ----- 
From: "Tony L. Svanstrom" <tony at moon.pp.se>
> You might get great results looking for certain patterns in the HTML, but 
> then
> you're just playing the evolution game with the spammers

Nope, actually bogofilter works pretty well on my HTML so far.  I only 
perform a few "hacks" to make spammer tricks more noticeable to bogofilter.

> To me that isn't a beautiful solution; using (hardcoding) fads is IMNSHO 
> at
> best an ugly hack.

That's ironic, because to me, deleting all HTML is the ugliest possible 
hack.

> If you want to do it right with HTML, then you basically need to build 
> that
> part of the spamfilter on top of a webbrowser; which of course is far from
> impossible, but it's a huge mess that most people so far have wanted to 
> stay as
> far away from as possible (and I don't blame them).

Parsing HTML may not be such a bad idea, but the beauty of statistical 
filtering is that you don't need to for the most part.  HTML does not cause 
any problems for my filter.

> Give it a cpl of years and we'll have spamfilters which are great at 
> knowing
> what part of an e-mail is hidden and what is visible; by then we'll of 
> course
> see a lot more spam using flash, java and even MP3 to get their message to 
> the
> user.
>
> The more fancy stuff we allow in our e-mails, the easier we'll make it for 
> the
> (future) spammers to play the evolution game; and we'll always be the ones
> catching up... we'll always be the ones at least a cpl of weeks behind the
> latest fad.

Yep, those Amish really know how to live life to the fullest... no traffic 
jams, electricity bills, credit cards, or annoying cell phones.  I wonder 
why anyone would want any of those "fancy" things.  As soon as you buy a 
computer or gadget, it's already obsolete... we'll always be a cpl weeks or 
months behind the latest fad.  We should just stop buying new stuff.  Those 
Luddites were on to something.  No new technology!  Technology sucks! 
Technology will bring apocalpyse!  Fear technology!  Fear change!

> Sooner or later, of course, we'll see something like we do today with
> javascripts great at poping up ads even though a lot of people are using 
> popup
> blockers

Mozilla works fine for me.  I don't get any popups.

> Today we see trojans, worms and viruses used to take over computers to 
> send
> out spam, and popup ads; in the future we'll see a lot more discreet 
> "malware"
> used primarily to insert spam directly into mailprograms after the 
> filtering's
> done (and rewriting webpages so that you'll only see the ads that are 
> meant to
> be on those pages, but with the affiliate ID of another person/group).
> These spam/malware will propagate slower; but using a NNTP-like solution 
> along
> with portknocking and a social network-structure they'll be VERY (cost/
> bandwidth)  effective.

I doubt it.  Spam is dying.  In another 3-5 years, it'll be gone altogether. 
Sooner if any of the M$ propaganda about Longhorn/Vista is true (BTW, buy 
MSFT while it's cheap).  Viruses/spywhere have never ever been a problem for 
me.  Spam at this point has ceased being a problem for me.  It's only a 
matter of time before more people lick it as well.

> at "glossy brochure crap", and since I'm already living without it I know 
> that
> by removing the HTML I won't miss anything which I'm not already just 
> deleting
> (manually) without reading today.

I can get along just fine with a tent, good boots, and some flint & steel. 
But that doesn't mean it's a good lifestyle or that I won't "miss" anything. 
I would certainly miss laying back in my memory foam mattress and watching 
HDTV, fresh food in the fridge, and my jetted tub, among other things.  Just 
because my toilet sometimes clogs up doesn't mean I'm going to ditch indoor 
plumbing.

Deleting HTML is not the answer.  Addressing the abuse of HTML is an answer 
(such as by parsing it).  Detecting the presense of things within or without 
the HTML which gives away the spam is another answer.  I do the latter by 
looking up spamvertized URLs in block lists.  No matter how much "innocent" 
or "benign" content a spammer tries to hide in HTML, a URL listed in a block 
list will always give it away.  The "innocent" text is usually classified as 
neutral anyway.  There hasn't been an Ebay or PayPal spam yet which has 
gotten past this, even though I have received similar ham from those 
sources.

Tom





More information about the Bogofilter mailing list