spam addrs

Pavel Kankovsky peak at argo.troja.mff.cuni.cz
Mon Jul 5 00:03:37 CEST 2004


On Mon, 28 Jun 2004, David Relson wrote:

> Remember, your trusted relay list hypothesized having a list.  If the
> list included both "1.2.3.4" and "127.0.0.1", then 127.0.0.1 would never
> be returned as the spam address.

127.0.0.1 would be returned as the spam address if and only if the list of 
relays extracted would Received headers, let's call it L, would satisfy 
the following two conditions:
1. every entry of L was a trusted address,
2. the last entry of L was 127.0.0.1.

Indeed, a spam sent from the local machine (any trusted relay to be
precise) including faked Received headers could confuse the algorithm.

> Now for the change suggested, i.e. requiring "from" before the spam
> address:

Unfortunately, the condition is considerably more complex.

The contents of Received header is supposed to be made of a list of
(keyword, value) pairs with optional parenthesised comments followed by a 
semicolon and a timestamp. E.g. the following header:

Received: from nic.osagesoftware.com (localhost.osagesoftware.com [127.0.0.1])
        by mail.osagesoftware.com (Postfix) with ESMTP id C7CE82FD8F
        for <peak at argo.troja.mff.cuni.cz>; Mon, 28 Jun 2004 19:32:01 -0400 (EDT)

should be interpreted as:

keyword  value                    comment
from     nic.osagesoftware.com    (localhost.osagesoftware.com [127.0.0.1])
by       mail.osagesoftware.com   (Postfix)
with     ESMTP
id       C7CE82FD8F
for      <peak at argo.troja.mff.cuni.cz>

timestamp  Mon, 28 Jun 2004 19:32:01 -0400 (EDT)

I said Bogofilter should look at the value following "from" (the keyword)  
and nearby comments (usually following the value). This means it should
start paying attention at "from"  (this is the first word after Received:
in most cases) and *stop* paying attention at the following keyword ("by"
in most cases).

Here is a real-world example demonstrating the problem:

Received: from unknown (HELO ts) (ts at securityoffice.net@[195.174.196.28]) (envelope-sender <ts at securityoffice.net>)
          by 212.98.247.194 (qmail-ldap-1.03) with SMTP
          for <full-disclosure at lists.netsys.com>; 3 Feb 2003 22:06:10 -0000

As you can see, there is an IP address after "by" (qmail does this when 
the local name is unknown). This means the local address of Received 
("212.98.247.194" after "by") would be picked rather than the remote 
address ("195.174.196.28").

Unfortunately, a simple check for "by" is not good enough because such a 
word could appear elsewhere (eg. "(HELO by)"...the contents of this text 
is under the SMTP client's control!).

--Pavel Kankovsky aka Peak  [ Boycott Microsoft--http://www.vcnet.com/bms ]
"Resistance is futile. Open your source code and prepare for assimilation."






More information about the Bogofilter mailing list