Stripsearch
Tom Anderson
tanderso at oac-design.com
Wed Jun 22 15:25:14 CEST 2005
----- Original Message -----
From: "Chris Fortune" <cfortune at telus.net>
>> That's an accuracy of 99.87% and growing.
> could this be the end of spam as we know it?
As we know it perhaps, but not an end in general. I'm comfortably receiving
a false negative or unsure every 2-3 days. I just got one this morning
which looked like a virus since it didn't contain any URLs, but had a PIF
attachment. I should probably run an antivirus in my pipeline too. In any
event, this is like it was back in around 1995 before spam became the huge
problem it is today for most people. I doubt spam will ever be eliminated
entirely since the lines between legitimate advertising, annoying
acquaintences, etc., and spam are rather blurry. Even receiving one
unwanted message a day seems reasonable to me. At that rate, the old
argument of "just delete it" holds water again. But that doesn't negate the
fact that I have to repel nearly 1000 spams a day that I never see. That's
just a huge waste of resources.
My hope is that more and more people will attain the 99.8%+ filtration rate,
and it will eventually become too expensive (even at fractions of a penny
per email) for spammers to stay in business. If a spammer emailed everyone
in America (~300,000,000 assuming everyone had an email address), and 99.8%
were filtered, 600,000 would get through. If the spam received a 0.1%
response rate (very generous), there would be 600 respondants. If the
product being marketed then received a 10% buy rate (very generous), there
would be 60 sales. Assuming a cost of $10/Gb for bandwidth and an average
email size of 2Kb, the product being marketed would have to sell for $100
profit to break even. Accounting for the cost of the product to the spammer
as well as other operational costs, the actual price would likely need to be
upwards of $150-$200. The pricier the product, the smaller the chance of
people buying it. That's a lot of assumptions, but the point is that there
is definitely a limit to how much a spammer can profitably spam. The fact
that spammers are now largely trying to use zombies for spamming indicates
that we may have reached that point already. And given that the virus
problem is a better defined problem which will hopefully tend to fade away
as Microsoft gets better at security like they've promised, this new spammer
tack should indicate that they've lost the wind in their sails for good.
The only move for them to make is to better target their spams to decrease
their filtration rates and to increase their response rates, and the better
they get at targeting, the less spam-like it becomes.
>> The SCAM-ADDRESS token is not generated from RBLs, it is generated when
>> the
>> href and the visible text do not have the same domain. If you can send
>> me
>> both of these URLs, I'll check to see if this is a bug.
>
> Check attachement [
> Annual_Privacy_and_Electronic_Fund_Transfer_Rights_Notice.eml ]
I didn't get an attachment.
>> All in all, it's working out great so far. I just hope I can find the
>> time
>> sometime soon to integrate it with spamitarium.
>
> I tried running stripsearch against both head and body. There's a
> theoretical danger of false positives, but it has the added bonus
> of "messing up" and disabling spam email so that it doesn't execute in
> security-challenged clients like Outlook Express.
That's not such a good idea. In the event that a ham contains blocked
addresses in the header, it'll be difficult to determine that it was a ham,
even if it's not a false positive as far as bogofilter is concerned. The
reason for needing to parse the header is so that I can acquire the MIME
seperator and parse the different body parts independently. I wouldn't do
any URL replacement in the header. I could look up addresses in the header
and do something with that (set an X-header line or something), but it's
more efficient to do DNSBL lookups on the sender at SMTP time anyway. There
would be no incentive for spammers to list blocked URLs in any other part of
the header. The only places that block lists are useful is on the sender
(done at SMTP time) and on the URLs they want you to click on in the body
(which is what Stripsearch takes care of).
Tom
More information about the Bogofilter
mailing list