ignore text/plain part of multipart/alternative messages?
David Flanagan
david at davidflanagan.com
Mon Aug 11 20:51:02 CEST 2003
The biggest category of spam that's been getting through to me is
multipart/alternative messages that contain text apparently excerpted
from books in the text/plain part, and whatever the spammer's payload is
in the text/html part.
In Paul Graham's latest article, he asserts that this type of spam isn't
a big deal because the plain/text camouflage doesn't actually look like
real e-mail. I'm not sure I agree: the ones that are getting through to
me seem to be excerpts from political memoirs or something about the
Reagan/Bush years. Since I get a lot of legitimate e-mail griping
about the current Bush administration, these spam get through to me.
In any case, I think there is a (simple?) solution. For
multipart/alternative messages, I think that only the default part
should be tokenized. I'm sure that 99% of the mail-readers out there
display the text/html part of these messages, and no spammer is going to
send spam in the text/plain part. So don't even bother tokenizing that
part: just skip to the payload.
Bogofilter already groks MIME doesn't it? So this should be easy to do,
shouldn't it? (Not that I'm volunteering...) Anyone have a
counterargument for why it shouldn't be done?
David Flanagan
More information about the Bogofilter
mailing list