spam addrs
David Relson
relson at osagesoftware.com
Mon Jun 28 14:01:57 CEST 2004
On Mon, 28 Jun 2004 11:05:08 +0200 (CEST)
Pavel Kankovsky wrote:
> On Mon, 14 Jun 2004, David Relson wrote:
>
> > The second version is a bit more complex. Save the last IP address
> > of the first Received: statement containing an IP address. That
> > will give the correct answer for:
>
> This should be restricted to the value following "from" and subsequent
>
> comments. It is possible (although quite unlikely) to encounter an IP
> address in other fields of Received header, esp. "by" and "for" (see
> RFC 2821, section 4.4 for details). Eg.
> Received: from 1.2.3.4 by 5.6.7.8 for xyz@[9.10.11.12]; ....
>
> The bad news is the tokenizer does not care about parentheses, and the
>
> following lines having a completely different meaning
> Received: from word1 (by word2) by word3....
> Received: from word1 by word2 (by word3)....
> are indistinguishable after tokenization.
>
> Personally, I'd restrict it further to IP addresses
> 1. enclosed by brackets i.e. "[1.2.3.4]" (Sendmail/Postfix style), or
> 2. enclosed by parentheses and optionally prefixed by "user@"
> i.e. "(1.2.3.4)" or "(user at 1.2.3.4)" (qmail style), or
> 3. (unless anything matching (1) or (2) is found) following "from"
> immediately i.e. "from 1.2.3.4"
>
> But again, this cannot be done after tokenization.
>
> > The third version excludes "but not 127.0.0.1".
>
> It would be cool if the list of "trusted relays" was configurable.
> The program would (try to) skip any Received headers indicating the
> mail arrived from one of the listed trusted relays. This would solve
> the problem of mail received indirectly.
>
> But there's a catch:
> Received: from trusted.relay ([1.2.3.4])...
> Received: from localhost ([127.0.0.1]) ...
> would return bogus 127.0.0.1 rather than the trusted.relay's IP
> address.
Hi Pavel,
Your comments about parentheses and brackets are correct. Bogofilter
ignores them.
Remember, your trusted relay list hypothesized having a list. If the
list included both "1.2.3.4" and "127.0.0.1", then 127.0.0.1 would never
be returned as the spam address.
By the way, it occurs to me that "spam address" is a better name for
this feature and the format spec should be "%S" rather than "%I".
Anybody care if I change it???
Now for the change suggested, i.e. requiring "from" before the spam
address:
File lexer_v3.l has the task of parsing the message and creating raw
tokens. File token.c gets the raw tokens and has the responsibility for
creating the tokens used in scoring. Through a simple state machine,
its get_token function adds the "head:" and other prefixes, deals with
subnet tokens, i.e. converting 123.45.67.89 to ip:123.45.67.79,
ip:123.45.67, ip:123.45, and ip:123. The state machine also deals with
Received: lines and saving of the IP address.
The patch below adds in a check for "from" so that the address saved
follows "from". I think it implements what you describe. Let me know.
Disclaimer: the IP address found is the message's originating IP address
iff your MTA adds a compatible Received: line.
Regards,
David
diff -u -r1.83 token.c
--- token.c 27 Jun 2004 15:33:29 -0000 1.83
+++ token.c 28 Jun 2004 11:55:46 -0000
@@ -26,7 +26,7 @@
#include "token.h"
#include "xmemrchr.h"
-typedef enum { R_INIT, R_SAVE, R_DONE } R_STATE;
+typedef enum { R_INIT, R_FROM, R_SAVE, R_DONE } R_STATE;
/* Local Variables */
@@ -154,6 +154,11 @@
case TOKEN: /* ignore anything when not reading text MIME types */
if (token_prefix != NULL) {
word_t *o = yylval;
+ if (r_state == R_INIT &&
+ token_prefix == w_recv &&
+ strcmp(yylval->text, "from") == 0) {
+ r_state = R_FROM;
+ }
yylval = word_concat(token_prefix, yylval);
word_free(o);
}
@@ -174,9 +179,8 @@
break;
case IPADDR:
- if ((token_prefix == w_recv) &&
- (r_state == R_INIT || r_state == R_SAVE) &&
- (strcmp(yylval->text, "127.0.0.1") != 0)) {
+ if (r_state == R_FROM &&
+ strcmp(yylval->text, "127.0.0.1") != 0) {
/* Not guaranteed to be the originating address of the message.
*/
r_state = R_SAVE;
word_free(ipaddr);
More information about the Bogofilter
mailing list