PATCH to ignore uuencoded attachments

Matthias Andree matthias.andree at gmx.de
Sat Nov 27 01:19:18 CET 2004


David Relson <relson at osagesoftware.com> writes:

> I've attached a patch for lexer_v3.l that I think will help you.  
>
> Checking my archives (approx 350,000 messages), I found 6 messages with
> "begin 666" in them, and 3 of those were from Oct 2003 when you posted
> this very same problem.  The lack of test cases means I can't thoroughly
> test the patch.  Please give it a good workout and let me know if the
> patch works for you.

I'm not comfortable with the patch.

Mails that try to hide content from Outlook Express are in the wild, and
we mustn't let ourselves be fooled by them.

> +<TEXT>begin\ 666\ .*  				{ BEGIN UUENCODED; }
> +<UUENCODED>end$					{ BEGIN TEXT; }

These must be anchored to the beginning of the line with "^" (is also
more efficient).

A nul-length line, that is one matching the regexp   ^[` ]$   also ends
the uuencoded part.

Microsoft also appears to accept uuencode if the mode is the empty
string, i. e. "begin  filename" (with two blanks).

> +<UUENCODED>{TOKEN}				/* ignore tokens */
> +<UUENCODED>\${NUM}(\.{NUM})?			/* ignore money */

*shrug* These shouldn't happen, and $ is a valid character in uuencoded
mode.

Actually, we would want to check that we're actually seeing uuencoded
lines. That would be some custom function that calls BEGIN TEXT; if the
line is corrupt, to defang the Outlook confusion
mails. ("I-love-you-signature", a fake virus to scare Outlook Express
users)

UUencode, like base64, transforms 3 octets to 4 sextets, but uses a
different alphabet, namely it just adds 0x20 to the sextet. Hence
decoding is easy, just o = (i & 0x3f) ^ 0x20, or o = (i - 0x20) & 0x3f.

Either allows 0x20 to be replaced by 0x60, ` in ASCII, ISO-8859 and
UTF-8.

A uuencode line starts with the encoded length byte. This counts the
octets, so we commonly see M here which is decoded 0x2d or 45. 45 octets
mean 60 sextets, hence 60 encoded bytes follow an M. This is also the
maximum allowed length.

It is a habit for some systems to write a zero-length line to mark the
end of the decoding, as mentioned above.


Then there is also uuencode Base64 (I don't know if Microsoft support
that), starts with a line matching ^begin-base64 [0-7]?[0-7]?[0-7] .*$
and ends with the line matching ^====$ and in between regular RFC-2045
compatible base64 of no more than 76 encoded characters per line -- I
wouldn't bother about this now, it's quite rare.

-- 
Matthias Andree




More information about the bogofilter-dev mailing list