multipart spam

Pavel Kankovsky peak at argo.troja.mff.cuni.cz
Sun Nov 14 20:37:36 CET 2004


On Sun, 14 Nov 2004, Matthias Andree wrote:

> We _can_ skip fixed types in multipart/alternative with a little effort
> - but it must be a fixed list such as "ignore text/plain in
> multpart/alternative" - the parser is single-pass and not supposed to
> store several MB of mail in RAM.

"ignore text/plain in multpart/alternative" would fail on 
multipart/alternative containing text/plain only.

Nevertheless, single parsing with acceptable memory consumption is
possible even with a more complex policy. When you encounter
multipart/alternative, you save the current (multi)set of words,
and you roll back to that save state whenever you find a better
alternative.

A different approach would detect differences between alternatives as
"metatokens". An ordinary multipart/alternative should contain more or
less the same text (+/- HTML and similar crud). A deceptive
multipart/alternative contains two (or more) different texts and
Bogofilter might be able to learn to recognize it.

> > Reformat the email so that only the parts intended to be displayed to
> > the recipient are included, for example.  The resulting email would
> > then be used for classification.
> 
> That won't do site-wide. I am using mutt to display the "plain" part
> from multipart mail. [...]

I don't think Mutt users are interesting targets from the spammers' POV.
Their primary target are lusers with luser-friendly MUAs preferring 
blinking and colourful HTML to boring plain text.

--Pavel Kankovsky aka Peak  [ Boycott Microsoft--http://www.vcnet.com/bms ]
"Resistance is futile. Open your source code and prepare for assimilation."




More information about the Bogofilter mailing list