spam addrs

David Relson relson at osagesoftware.com
Tue Jun 15 01:36:31 CEST 2004


Greetings,

I've been looking at bogofilter's parsing code with an eye to making a
message's IP address available for logging.  

Bogofilter's lexer already has an IPADDR pattern for identifying ip
addresses and returns an appropriate type to the get_token() function.
The function also knows when it's processing a Received: header
statement.  Together those two bits of info form the basis of keeping
the message's IP address for (optional) use in logging mesages.

My first version saved the first IP address seen in a Received:
statement.  This works fine in many cases, for example:

Received: from aol.com (machine.domain.com [192.255.1.2])

However, if the machine name is of form "1.2.3.4.domain.com", the saved
value will be "1.2.3.4", which is wrong.

The second version is a bit more complex.  Save the last IP address of
the first Received: statement containing an IP address.  That will give
the correct answer for:

Received: (qmail 937 invoked from network); 2 Feb 2004 19:21:52 -0000
Received: from natmout00.rzone.de (natmout00.rzone.de [81.169.145.163])
	by mail.nn7.de (8.12.10/8.12.10) with ESMTP id i12JLAWl009417
	for <bugreports at nn7.de>; Mon, 2 Feb 2004 20:21:10 +0100 (MET)

but not for:

Received: (qmail 937 invoked from network); 2 Feb 2004 19:21:52 -0000
Received: from unknown (HELO localhost) (127.0.0.1)
  by localhost with SMTP; 2 Feb 2004 19:21:52 -0000
Received: from natmout00.rzone.de (natmout00.rzone.de [81.169.145.163])
	by mail.nn7.de (8.12.10/8.12.10) with ESMTP id i12JLAWl009417
	for <bugreports at nn7.de>; Mon, 2 Feb 2004 20:21:10 +0100 (MET)

The third version excludes "but not 127.0.0.1".

The actual work is done by a simple state machine in token.c, a global
variable ipaddr for saving the value, and 'I' recognition in format.c.

Use of the capability is via bogofilter.cf statements like:

header_format = "%h: %c, spamicity=%p, version=%v, ipaddr=%I
log_header_format = "%h: %c, spamicity=%p, version=%v, ipaddr=%I

In cases where the message is forwarded multiple times by internal mail
servers, this method of identifying the ip address will likely identify
one of those servers, making the saved information not useful.  When/if
someone has more complex needs, they can ask for help in implementing
what they actually need.

As has been mentionned, using this ipaddress for blacklisting should
only be done after further checking and/or validation of the address.

Any comments?

David



More information about the Bogofilter mailing list