Much simplified lexer

michael at optusnet.com.au michael at optusnet.com.au
Sun Nov 16 23:09:13 CET 2003


Matthias Andree <matthias.andree at gmx.de> writes:
> On Fri, 14 Nov 2003, michael at optusnet.com.au wrote:
[...]
> I don't see it as a bad idea. The goal of these rules is to
> reduce the count of unique tokens in the data base that aren't
> indicative for spam, in other words: avoid ballast.
> 
> Locally-generated Received headers can be near-unique (they aren't on
> some systems, inode numbers can be recycled for instance, these are only
> unique for a given point in time), but they are not sent by the spammer,
> and they are at least not surprising to the end user, hence they (the
> locally generated headers at large) carry no entropy, they contain no
> information.

Sorry, that's absolutely wrong. A fair bit of spam that I get has
a _single_ Received line. (i.e. the only one that's there is
what my system added). It's stuffed full of good info.

Received: from 1.2.3.4 (mctn1-7619.nb.aliant.net [156.34.21.199])
        by funny.optusnet.com.au (8.12.8/8.12.8) with SMTP id hAC2vOQa029135
        for <james at plastic.whatnot.net.au>; Wed, 12 Nov 2003 13:57:53 +1100

The 1.2.3.4 comes from the spammer. You'll note they lied. The _real_ source
is in the parenthesis following. The 'funny.optusnet' is my system. The
'james at plastic' is the spammer.

The '1.2.3.4' part, and the bit in parenthesis are very rich token
sources. (open relays for example will have their IP address
show up in there and be very quickly marked as very spammy tokens).
 
[...]
> All mails that I receive have three locally-generated Received: headers,
> one from my upstream POP3 server, one from fetchmail, one from my local
> Postfix. Discarding these three lines will not discard information that
> had been present at the originator's (spammer's) site.

That's your configuration, and it's a _relatively_ rare one. The
common case is people POP'ing their mail straight of their ISPs mail
server.

Michael.




More information about the Bogofilter mailing list