message tags
David Relson
relson at osagesoftware.com
Tue Sep 7 14:59:42 CEST 2004
On Tue, 07 Sep 2004 08:26:46 -0400
Tom Allison wrote:
> David Relson wrote:
> > Matthias,
> >
> > This month I've received 18,000+ spam. Out of curiosity I ran a
> > modified bogolexer and printed the 3 attributes that bogofilter
> > identifies - msg_addr, msg_id, and queue_id. I then sorted and
> > counted them. Here's a table of counts vs repeats:
> >
> > 15841 1
> > 537 2
> > 304 3
> > 256 4
> > 18 5
> >
> > For example, 15,841 messages were unique (count=1) and 18 messages
> > for which 5 copies were received. Checking further each of the 18
> > message sets contains 5 addresses for my domain, e.g.:
> >
> > To: linda at example.com
> > Cc: relson at example.com, eric at example.com,
> > mark at example.com, webmaster at example.com
> >
> > The actual messages differ only in the "X-Original-To:" and
> > "Delivered-To:" attributes:
> >
>
> Do these 18 emails use the same five Message-ID values?
> Are the Messages themselves (Body/Subject) actually different?
>
> I ask because I would expect that the formail -D option may take care
> of these for you.
Tom,
I see duplication of Message-ID/Queue-ID values happening when a spammer
has an address list sorted by domain and sends multiple messages in one
SMTP session. The Message-ID is repeated during transmission and the
Queue-ID is replicated during receipt. The messages are identical
_except_ that the first is "To: linda at example.com", the second is "To:
relson at example.com", ... Given the different "To:" lines, checksumming
gives different results (unless the "To:" line is deleted).
David
More information about the Bogofilter
mailing list