Marking Attachments [was: chinese-korean-non_latin spam]

Tim Freeman tim at fungible.com
Sat Mar 8 07:02:24 CET 2003


>Marking attachments ...  Sounds like an interesting concept.  Would you 
>care to elaborate on what you want to have happen?

Here's an example.  This executable came in (probably a worm), except
I ripped out many of the irrelevant lines and most of the executable portion:

   From: <big at boss.com>
   To: <censored at censored>
   Subject: Re: Movies
   Date: Tue, 4 Feb 2003 3:11:54 +0100
   Importance: Normal
   MIME-Version: 1.0
   Content-Type: multipart/mixed;
   	boundary="CSmtpMsgPart123X456_000_01148142"

   This is a multipart message in MIME format

   --CSmtpMsgPart123X456_000_01148142
   Content-Type: text/plain;
   	charset="iso-8859-1"
   Content-Transfer-Encoding: 7bit

   Attached file:
   --CSmtpMsgPart123X456_000_01148142
   Content-Type: application/octet-stream;
   	name="Movie_0074.mpeg.pif"
   Content-Transfer-Encoding: base64
   Content-Disposition: attachment;
   	filename="Movie_0074.mpeg.pif

   TVqQAAMAAAAEAAAA//8AALgAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
   AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
   AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==
   --CSmtpMsgPart123X456_000_01148142--

"spamoracle mark" is much like "bogofilter -p".  It copies the message
from stdin to stdout, adding a few header lines that are meant to be
used to classify the email later.  One new header line has spam
scores, of course.  The other added header line identifies the
attachments.  In this case, it looks like this:

   X-Attachments: type="application/octet-stream" name="Movie_0074.mpeg.pif" name="Movie_0074.mpeg.pif" 

This way I can write a procmail rule that says to discard all emails
that contain .pif attachments.  In procmail I can easily limit the
rule to looking at the email header.  That way emails like the present
one that talk about X-Attachments and spam aren't going to get thrown
in the spam bucket simply because of that. 

If there are multiple attachments, spamoracle just piles everything
together to get a longer X-Attachments line.  I rearranged the above
email to have two attachments and got:

   X-Attachments: type="application/octet-stream" name="Movie_0074.mpeg.pif" name="Movie_0074.mpeg.pif" type="application/octet-stream" name="zzz.z" name="blurgh" 

I think the redundant "name" fields are a consequence of having both a
name on the content-type of the attachment and a filename on the
content-disposition.  I think it's reasonable to keep them, since you
want to make all relevant information available to the procmail (or
whatever) filter.

-- 
Tim Freeman                                                  tim at fungible.com
Which is worse: ignorance or apathy? Who knows? Who cares?
GPG public key fingerprint ECDF 46F8 3B80 BB9E 575D  7180 76DF FE00 34B1 5C78 




More information about the Bogofilter mailing list