Marking Attachments [was: chinese-korean-non_latin spam]
Tim Freeman
tim at fungible.com
Sat Mar 8 07:02:24 CET 2003
>Marking attachments ... Sounds like an interesting concept. Would you
>care to elaborate on what you want to have happen?
Here's an example. This executable came in (probably a worm), except
I ripped out many of the irrelevant lines and most of the executable portion:
From: <big at boss.com>
To: <censored at censored>
Subject: Re: Movies
Date: Tue, 4 Feb 2003 3:11:54 +0100
Importance: Normal
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="CSmtpMsgPart123X456_000_01148142"
This is a multipart message in MIME format
--CSmtpMsgPart123X456_000_01148142
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Attached file:
--CSmtpMsgPart123X456_000_01148142
Content-Type: application/octet-stream;
name="Movie_0074.mpeg.pif"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="Movie_0074.mpeg.pif
TVqQAAMAAAAEAAAA//8AALgAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==
--CSmtpMsgPart123X456_000_01148142--
"spamoracle mark" is much like "bogofilter -p". It copies the message
from stdin to stdout, adding a few header lines that are meant to be
used to classify the email later. One new header line has spam
scores, of course. The other added header line identifies the
attachments. In this case, it looks like this:
X-Attachments: type="application/octet-stream" name="Movie_0074.mpeg.pif" name="Movie_0074.mpeg.pif"
This way I can write a procmail rule that says to discard all emails
that contain .pif attachments. In procmail I can easily limit the
rule to looking at the email header. That way emails like the present
one that talk about X-Attachments and spam aren't going to get thrown
in the spam bucket simply because of that.
If there are multiple attachments, spamoracle just piles everything
together to get a longer X-Attachments line. I rearranged the above
email to have two attachments and got:
X-Attachments: type="application/octet-stream" name="Movie_0074.mpeg.pif" name="Movie_0074.mpeg.pif" type="application/octet-stream" name="zzz.z" name="blurgh"
I think the redundant "name" fields are a consequence of having both a
name on the content-type of the attachment and a filename on the
content-disposition. I think it's reasonable to keep them, since you
want to make all relevant information available to the procmail (or
whatever) filter.
--
Tim Freeman tim at fungible.com
Which is worse: ignorance or apathy? Who knows? Who cares?
GPG public key fingerprint ECDF 46F8 3B80 BB9E 575D 7180 76DF FE00 34B1 5C78
More information about the Bogofilter
mailing list