multipart spam
Pavel Kankovsky
peak at argo.troja.mff.cuni.cz
Sun Nov 14 20:37:36 CET 2004
On Sun, 14 Nov 2004, Matthias Andree wrote:
> We _can_ skip fixed types in multipart/alternative with a little effort
> - but it must be a fixed list such as "ignore text/plain in
> multpart/alternative" - the parser is single-pass and not supposed to
> store several MB of mail in RAM.
"ignore text/plain in multpart/alternative" would fail on
multipart/alternative containing text/plain only.
Nevertheless, single parsing with acceptable memory consumption is
possible even with a more complex policy. When you encounter
multipart/alternative, you save the current (multi)set of words,
and you roll back to that save state whenever you find a better
alternative.
A different approach would detect differences between alternatives as
"metatokens". An ordinary multipart/alternative should contain more or
less the same text (+/- HTML and similar crud). A deceptive
multipart/alternative contains two (or more) different texts and
Bogofilter might be able to learn to recognize it.
> > Reformat the email so that only the parts intended to be displayed to
> > the recipient are included, for example. The resulting email would
> > then be used for classification.
>
> That won't do site-wide. I am using mutt to display the "plain" part
> from multipart mail. [...]
I don't think Mutt users are interesting targets from the spammers' POV.
Their primary target are lusers with luser-friendly MUAs preferring
blinking and colourful HTML to boring plain text.
--Pavel Kankovsky aka Peak [ Boycott Microsoft--http://www.vcnet.com/bms ]
"Resistance is futile. Open your source code and prepare for assimilation."
More information about the Bogofilter
mailing list