spam addrs

Tue Jun 15 14:10:55 CEST 2004

On Mon, 2004-06-14 at 19:36, David Relson wrote:
> Received: (qmail 937 invoked from network); 2 Feb 2004 19:21:52 -0000
> Received: from unknown (HELO localhost) (127.0.0.1)
>   by localhost with SMTP; 2 Feb 2004 19:21:52 -0000
> Received: from natmout00.rzone.de (natmout00.rzone.de [81.169.145.163])
> 	by mail.nn7.de (8.12.10/8.12.10) with ESMTP id i12JLAWl009417
> 	for <bugreports at nn7.de>; Mon, 2 Feb 2004 20:21:10 +0100 (MET)
> 
> The third version excludes "but not 127.0.0.1".

What do you mean by that last statement... do you mean it excludes
127.0.0.1 from being entered in the log?  If you're going to do that,
you should exclude other local and reserved addresses as well.  

Local and reserved addresses look like:
/^((?:127\.)|(?:10\.)|(?:172\.(?:1[6-9]|2[0-9]|31)\.)|(?:192\.168\.)|(?:169\.254\.))/

After excluding the above, valid IPs look like:
/^((?:0?0?\d|[01]?\d\d|2[0-4]\d|25[0-5])\.(?:0?0?\d|[01]?\d\d|2[0-4]\d|25[0-5])\.(?:0?0?\d|[01]?\d\d|2[0-4]\d|25[0-5])\.(?:0?0?\d|[01]?\d\d|2[0-4]\d|25[0-5]))$/

What would happen in the following case:

i12JLAWl009417
        for <4.3.2.1 at 1.2.3.4.nn7.com>; Mon, 2 Feb 2004 20:21:10 +0100
(MET)

In this case, IP-like strings may occur in the "by" or "for" portions of
the received line.

Or in either of these two lines:
Received: from 1.2.3.4 ([5.6.7.8] ident=4.3.2.1)
Received: from 1.2.3.4 (proxying for 5.6.7.8) (user 4.3.2.1)

In both of those cases (which are valid MTA received lines), even
excluding the "by" and "for" portions, the IP (5.6.7.8) is in the middle
between other IP-like strings.

You have to consider all of the possible places that an IP-like string
may occur.  Assuming that all emails will behave according to your
expectations is dangerous, especially since spammers are actively trying
to foil your efforts.

> In cases where the message is forwarded multiple times by internal mail
> servers, this method of identifying the ip address will likely identify
> one of those servers, making the saved information not useful.  When/if
> someone has more complex needs, they can ask for help in implementing
> what they actually need.

If you exclude all of the IPs in the "local" regex above, then you'll
eliminate some of that possibility.  For further ensurance, you could
exclude logging of any IP in the same class B as the receiving machine. 
You can be relatively certain that nobody should spam you from within
your company or your same ISP.  I've also done this in spamitarium:

sub is_same_class_B
{
	my ($ip1,$ip2) = @_;
	$ip1 =~ s/^(\d{1,3}\.\d{1,3}\.).*?$/$1/gis;
	$ip2 =~ s/^(\d{1,3}\.\d{1,3}\.).*?$/$1/gis;

	return ($ip1 eq $ip2)? 1:0;
}

If you've found a valid IP within a received line's "from" portion that
matches a known pattern, which is not local and not reserved and it is
outside of the receiving server's class B network, then you can be
relatively certain that it is either a spammer, a spammer's immediate
network, or an open relay.

Tom