spam addrs
Pavel Kankovsky
peak at argo.troja.mff.cuni.cz
Mon Jul 5 00:03:37 CEST 2004
On Mon, 28 Jun 2004, David Relson wrote:
> Remember, your trusted relay list hypothesized having a list. If the
> list included both "1.2.3.4" and "127.0.0.1", then 127.0.0.1 would never
> be returned as the spam address.
127.0.0.1 would be returned as the spam address if and only if the list of
relays extracted would Received headers, let's call it L, would satisfy
the following two conditions:
1. every entry of L was a trusted address,
2. the last entry of L was 127.0.0.1.
Indeed, a spam sent from the local machine (any trusted relay to be
precise) including faked Received headers could confuse the algorithm.
> Now for the change suggested, i.e. requiring "from" before the spam
> address:
Unfortunately, the condition is considerably more complex.
The contents of Received header is supposed to be made of a list of
(keyword, value) pairs with optional parenthesised comments followed by a
semicolon and a timestamp. E.g. the following header:
Received: from nic.osagesoftware.com (localhost.osagesoftware.com [127.0.0.1])
by mail.osagesoftware.com (Postfix) with ESMTP id C7CE82FD8F
for <peak at argo.troja.mff.cuni.cz>; Mon, 28 Jun 2004 19:32:01 -0400 (EDT)
should be interpreted as:
keyword value comment
from nic.osagesoftware.com (localhost.osagesoftware.com [127.0.0.1])
by mail.osagesoftware.com (Postfix)
with ESMTP
id C7CE82FD8F
for <peak at argo.troja.mff.cuni.cz>
timestamp Mon, 28 Jun 2004 19:32:01 -0400 (EDT)
I said Bogofilter should look at the value following "from" (the keyword)
and nearby comments (usually following the value). This means it should
start paying attention at "from" (this is the first word after Received:
in most cases) and *stop* paying attention at the following keyword ("by"
in most cases).
Here is a real-world example demonstrating the problem:
Received: from unknown (HELO ts) (ts at securityoffice.net@[195.174.196.28]) (envelope-sender <ts at securityoffice.net>)
by 212.98.247.194 (qmail-ldap-1.03) with SMTP
for <full-disclosure at lists.netsys.com>; 3 Feb 2003 22:06:10 -0000
As you can see, there is an IP address after "by" (qmail does this when
the local name is unknown). This means the local address of Received
("212.98.247.194" after "by") would be picked rather than the remote
address ("195.174.196.28").
Unfortunately, a simple check for "by" is not good enough because such a
word could appear elsewhere (eg. "(HELO by)"...the contents of this text
is under the SMTP client's control!).
--Pavel Kankovsky aka Peak [ Boycott Microsoft--http://www.vcnet.com/bms ]
"Resistance is futile. Open your source code and prepare for assimilation."
More information about the Bogofilter
mailing list