Prediction [was: spam addrs]

Tue Jun 29 20:01:57 CEST 2004

From: "David Relson" <relson at osagesoftware.com>
> I've seen many software projects where time was spent trying to
> anticipate everything the user wanted.  I've seen others where the time
> was spent addressing the needs.  The "needs" based projects tended to be
> more successful than the "wants" project -- because it's impossible to
> anticipate what is really valuable.

I don't think this is a matter of wants and needs.  We're not talking about
adding new functionality, we're talking about making sure that a proposed
functionality actually does what its supposed to do.  That is, output the
correct IP of the sender.

> So I'm willing to deal with what actually affects people and am not
> willing to try to predict future spammer tricks.

Most of the security problems in software today stem from the fact that
developers assume that users are going to follow their intended path through
the software.  Crackers try all of the unintended paths.  All it would
require is proper bounds checking, input validation, etc., to close up most
problems.  This is essentially the same issue here.  You are assuming that
spammers will be kind and gentle with bogofilter, providing intended data.
That's not a very good assumption.  Spammers are actively trying to defeat
filtering software.

> 'Tis nice that spamitarium can correctly process
>
>   Received: from helo-[1.2.3.4] 65.126.137.220 as209
>     by oac-design.com 216.109.145.120
>
> but what MTA delivers this format (unbracketed address)?  I'm interested
> in "out of the box" delivery formats, not "I'm going to customize _my_
> MTA's format so that it's different."

Well, I know that Squirrelmail does, and maybe others.  But that wasn't my
point here... what I'm saying is that any spammer can open an SMTP
conversation with "HELO [198.187.190.55]", and MTAs (sendmail at least) will
accept that as a valid HELO string.  The resulting sendmail received line,
"out of the box," will be as follows:

Received: from [198.187.190.55] (spammer.com [208.254.3.160]) ...

Now, if bogofilter looks for the first square-bracketed IP address, it's
going to return 198.187.190.55.  This of course was forged by the spammer,
and the bogofilter user will end up blocking email from disney.com instead
of spammer.com.  Let's say you're using Exim instead... now the received
line might look like:

Received: from [208.254.3.160] (helo=[198.187.190.55]) ...

I don't use Exim, so I don't know if it will accept brackets in the HELO
string or not, but as you can see, the IP is now at the end of the "from"
portion instead of the front.  Looking at just the front or just the end,
just bracketed IPs, etc., won't work unless you know the format of the MTA's
received line that is being used.

My point regarding spamitarium was that the regexes I used to determine what
string is the HELO and which is the IP were successful even with a bracketed
HELO string (which I wasn't even confident they would be, but in testing
they were).  I don't see any other way of being even slightly confident in
the IP being returned unless you are doing something similar.

Why would you want to release new functionality with known vulnerabilities,
and have to patch it later when spammers start taking advantage of them,
rather than address the issue now?

> The present cvs code includes a square bracket test, which removes the
> need for the "received state" state machine, but doesn't have a
> whitespace check.  If you want a copy of the patch to update 0.92.0 to
> cvs, let me know.

I'm still using 0.17.5, and I don't intend to use this functionality anyway,
so I don't need the patch.  I just don't like the idea of bogofilter
returning dubious data with some implied purpose which it isn't suited for.

Tom