Stripsearch

Wed Jun 22 15:25:14 CEST 2005

----- Original Message ----- 
From: "Chris Fortune" <cfortune at telus.net>

>> That's an accuracy of 99.87% and growing.
> could this be the end of spam as we know it?

As we know it perhaps, but not an end in general.  I'm comfortably receiving 
a false negative or unsure every 2-3 days.  I just got one this morning 
which looked like a virus since it didn't contain any URLs, but had a PIF 
attachment.  I should probably run an antivirus in my pipeline too.  In any 
event, this is like it was back in around 1995 before spam became the huge 
problem it is today for most people.  I doubt spam will ever be eliminated 
entirely since the lines between legitimate advertising, annoying 
acquaintences, etc., and spam are rather blurry.  Even receiving one 
unwanted message a day seems reasonable to me.  At that rate, the old 
argument of "just delete it" holds water again.  But that doesn't negate the 
fact that I have to repel nearly 1000 spams a day that I never see.  That's 
just a huge waste of resources.

My hope is that more and more people will attain the 99.8%+ filtration rate, 
and it will eventually become too expensive (even at fractions of a penny 
per email) for spammers to stay in business.  If a spammer emailed everyone 
in America (~300,000,000 assuming everyone had an email address), and 99.8% 
were filtered, 600,000 would get through.  If the spam received a 0.1% 
response rate (very generous), there would be 600 respondants.  If the 
product being marketed then received a 10% buy rate (very generous), there 
would be 60 sales.  Assuming a cost of $10/Gb for bandwidth and an average 
email size of 2Kb, the product being marketed would have to sell for $100 
profit to break even.  Accounting for the cost of the product to the spammer 
as well as other operational costs, the actual price would likely need to be 
upwards of $150-$200.  The pricier the product, the smaller the chance of 
people buying it.  That's a lot of assumptions, but the point is that there 
is definitely a limit to how much a spammer can profitably spam.  The fact 
that spammers are now largely trying to use zombies for spamming indicates 
that we may have reached that point already.  And given that the virus 
problem is a better defined problem which will hopefully tend to fade away 
as Microsoft gets better at security like they've promised, this new spammer 
tack should indicate that they've lost the wind in their sails for good. 
The only move for them to make is to better target their spams to decrease 
their filtration rates and to increase their response rates, and the better 
they get at targeting, the less spam-like it becomes.

>> The SCAM-ADDRESS token is not generated from RBLs, it is generated when 
>> the
>> href and the visible text do not have the same domain.  If you can send 
>> me
>> both of these URLs, I'll check to see if this is a bug.
>
> Check attachement [ 
> Annual_Privacy_and_Electronic_Fund_Transfer_Rights_Notice.eml ]

I didn't get an attachment.

>> All in all, it's working out great so far.  I just hope I can find the 
>> time
>> sometime soon to integrate it with spamitarium.
>
> I tried running stripsearch against both head and body.  There's a 
> theoretical danger of false positives, but it has the added bonus
> of "messing up" and disabling spam email so that it doesn't execute in 
> security-challenged clients like Outlook Express.

That's not such a good idea.  In the event that a ham contains blocked 
addresses in the header, it'll be difficult to determine that it was a ham, 
even if it's not a false positive as far as bogofilter is concerned.  The 
reason for needing to parse the header is so that I can acquire the MIME 
seperator and parse the different body parts independently.  I wouldn't do 
any URL replacement in the header.  I could look up addresses in the header 
and do something with that (set an X-header line or something), but it's 
more efficient to do DNSBL lookups on the sender at SMTP time anyway.  There 
would be no incentive for spammers to list blocked URLs in any other part of 
the header.  The only places that block lists are useful is on the sender 
(done at SMTP time) and on the URLs they want you to click on in the body 
(which is what Stripsearch takes care of).

Tom