ignore text/plain part of multipart/alternative messages?

David Flanagan david at davidflanagan.com
Mon Aug 11 20:51:02 CEST 2003


The biggest category of spam that's been getting through to me is
multipart/alternative messages that contain text apparently excerpted
from books in the text/plain part, and whatever the spammer's payload is
in the text/html part.

In Paul Graham's latest article, he asserts that this type of spam isn't
a big deal because the plain/text camouflage doesn't actually look like
real e-mail.  I'm not sure I agree: the ones that are getting through to
me seem to be excerpts from political memoirs or something about the
Reagan/Bush years.  Since I get a lot of legitimate e-mail griping
about the current Bush administration, these spam get through to me.

In any case, I think there is a (simple?) solution.  For
multipart/alternative messages, I think that only the default part
should be tokenized.  I'm sure that 99% of the mail-readers out there
display the text/html part of these messages, and no spammer is going to
send spam in the text/plain part.  So don't even bother tokenizing that
part: just skip to the payload.

Bogofilter already groks MIME doesn't it?  So this should be easy to do,
shouldn't it?  (Not that I'm volunteering...)  Anyone have a
counterargument for why it shouldn't be done?

	David Flanagan





More information about the Bogofilter mailing list