message tags

David Relson relson at osagesoftware.com
Tue Sep 7 14:59:42 CEST 2004


On Tue, 07 Sep 2004 08:26:46 -0400
Tom Allison wrote:

> David Relson wrote:
> > Matthias,
> > 
> > This month I've received 18,000+ spam.  Out of curiosity I ran a
> > modified bogolexer and printed the 3 attributes that bogofilter
> > identifies - msg_addr, msg_id, and queue_id.  I then sorted and
> > counted them.  Here's a table of counts vs repeats:
> > 
> >   15841 1
> >     537 2
> >     304 3
> >     256 4
> >      18 5
> > 
> > For example, 15,841 messages were unique (count=1) and 18 messages
> > for which 5 copies were received.  Checking further each of the 18
> > message sets contains 5 addresses for my domain, e.g.:
> > 
> >   To: linda at example.com
> >   Cc: relson at example.com, eric at example.com,
> > 	mark at example.com, webmaster at example.com
> > 
> > The actual messages differ only in the "X-Original-To:" and
> > "Delivered-To:" attributes:
> > 
> 
> Do these 18 emails use the same five Message-ID values?
> Are the Messages themselves (Body/Subject) actually different?
> 
> I ask because I would expect that the formail -D option may take care
> of these for you.

Tom,

I see duplication of Message-ID/Queue-ID values happening when a spammer
has an address list sorted by domain and sends multiple messages in one
SMTP session.  The Message-ID is repeated during transmission and the
Queue-ID is replicated during receipt.  The messages are identical
_except_ that the first is "To: linda at example.com", the second is "To:
relson at example.com", ...  Given the different "To:" lines, checksumming
gives different results (unless the "To:" line is deleted).

David



More information about the Bogofilter mailing list