spam addrs

David Relson relson at osagesoftware.com
Mon Jun 28 14:01:57 CEST 2004


On Mon, 28 Jun 2004 11:05:08 +0200 (CEST)
Pavel Kankovsky wrote:

> On Mon, 14 Jun 2004, David Relson wrote:
> 
> > The second version is a bit more complex.  Save the last IP address
> > of the first Received: statement containing an IP address.  That
> > will give the correct answer for:
> 
> This should be restricted to the value following "from" and subsequent
> 
> comments. It is possible (although quite unlikely) to encounter an IP 
> address in other fields of Received header, esp. "by" and "for" (see
> RFC 2821, section 4.4 for details). Eg.
>    Received: from 1.2.3.4 by 5.6.7.8 for xyz@[9.10.11.12]; ....
> 
> The bad news is the tokenizer does not care about parentheses, and the
> 
> following lines having a completely different meaning
>    Received: from word1 (by word2) by word3....
>    Received: from word1 by word2 (by word3)....
> are indistinguishable after tokenization.
> 
> Personally, I'd restrict it further to IP addresses
> 1. enclosed by brackets i.e. "[1.2.3.4]" (Sendmail/Postfix style), or
> 2. enclosed by parentheses and optionally prefixed by "user@"
>    i.e. "(1.2.3.4)" or "(user at 1.2.3.4)" (qmail style), or
> 3. (unless anything matching (1) or (2) is found) following "from"
>    immediately i.e. "from 1.2.3.4"
> 
> But again, this cannot be done after tokenization.
> 
> > The third version excludes "but not 127.0.0.1".
> 
> It would be cool if the list of "trusted relays" was configurable.
> The program would (try to) skip any Received headers indicating the
> mail arrived from one of the listed trusted relays. This would solve
> the problem of mail received indirectly.
> 
> But there's a catch:
>    Received: from trusted.relay ([1.2.3.4])...
>    Received: from localhost ([127.0.0.1]) ...
> would return bogus 127.0.0.1 rather than the trusted.relay's IP
> address.

Hi Pavel,

Your comments about parentheses and brackets are correct.  Bogofilter
ignores them.

Remember, your trusted relay list hypothesized having a list.  If the
list included both "1.2.3.4" and "127.0.0.1", then 127.0.0.1 would never
be returned as the spam address.

By the way, it occurs to me that "spam address" is a better name for
this feature and the format spec should be "%S" rather than "%I". 
Anybody care if I change it???

Now for the change suggested, i.e. requiring "from" before the spam
address:

File lexer_v3.l has the task of parsing the message and creating raw
tokens.  File token.c gets the raw tokens and has the responsibility for
creating the tokens used in scoring.  Through a simple state machine,
its get_token function adds the "head:" and other prefixes, deals with
subnet tokens, i.e. converting 123.45.67.89 to ip:123.45.67.79,
ip:123.45.67, ip:123.45, and ip:123.  The state machine also deals with
Received: lines and saving of the IP address.  

The patch below adds in a check for "from" so that the address saved
follows "from".  I think it implements what you describe.  Let me know.

Disclaimer: the IP address found is the message's originating IP address
iff your MTA adds a compatible Received: line.

Regards,

David

diff -u -r1.83 token.c
--- token.c	27 Jun 2004 15:33:29 -0000	1.83
+++ token.c	28 Jun 2004 11:55:46 -0000
@@ -26,7 +26,7 @@
 #include "token.h"
 #include "xmemrchr.h"
 
-typedef enum { R_INIT, R_SAVE, R_DONE } R_STATE;
+typedef enum { R_INIT, R_FROM, R_SAVE, R_DONE } R_STATE;
 
 /* Local Variables */
 
@@ -154,6 +154,11 @@
 	case TOKEN:	/* ignore anything when not reading text MIME types */
 	    if (token_prefix != NULL) {
 		word_t *o = yylval;
+		if (r_state == R_INIT &&
+		    token_prefix == w_recv &&
+		    strcmp(yylval->text, "from") == 0) {
+		    r_state = R_FROM;
+		}
 		yylval = word_concat(token_prefix, yylval);
 		word_free(o);
 	    }
@@ -174,9 +179,8 @@
 	    break;
 
 	case IPADDR:
-	    if ((token_prefix == w_recv) &&
-		(r_state == R_INIT || r_state == R_SAVE) &&
-		(strcmp(yylval->text, "127.0.0.1") != 0)) {
+	    if (r_state == R_FROM &&
+		strcmp(yylval->text, "127.0.0.1") != 0) {
 		/* Not guaranteed to be the originating address of the message.
*/
 		r_state = R_SAVE;
 		word_free(ipaddr);



More information about the Bogofilter mailing list