multipart spam

Boris 'pi' Piwinger 3.14 at piology.org
Sun Nov 14 10:20:04 CET 2004


"Chris Fortune" <cfortune at telus.net> wrote:

>>Somebody has probably mentioned this already, but there seems to be a growing trend in "hard to classify" spam lately: MIME emails
>>with two multipart sections: text and html.
>---We had this discussion already many times. Does not seem to be true.
>
>But it is true.  I could post several hundred samples.

Some people had those a year ago. Not really a new trend.

>>The payload of the mail is in the HTML section (consisting of images and urls), but the
>>text section is filled with either conversational text taken from books, etc, or -even worse- authentic "ham" e-mails,
>---What is that? The spammer cannot know what ham means for me.
>
>That is untrue for a site-wide (shared) wordlist.

You mean some spam can access the mailboxes?

>>obviously sampled from somebody's sent folder.
>---So this is not known to my filter.
>
>The ham collected from a group of people is self-similar, like the vocabulary of a language.   

There might be some problem. This seems to be a general
limitation to site-wide filters. On the other hand this
should generate more false positives.

>>Other than registering each and every one of these
>>mails, then retraining the wordlist, any suggestions?
>---Training to exhaustion:-))
>
>That's the problem.   I am training bogofilter to recognize hammy tokens as spam, then training it again to do the opposite, several
>times a day.

Obviously, I also receive this kind of spam. In practice it
is not a problem. Actually, those words spammers use there
just become unimportant and the filter learns what really
are good and bad words.

>>I guess this could also be the beginning of a thread about de-obsfucation.
>---What should that do?
>
>Reformat the email so that only the parts intended to be displayed to the recipient are included, for example.  The resulting email
>would then be used for classification.

I believe someone had something like that long ago on this
list. To some extend bogofilter does that.

>Outgoing mail is certified Virus Free.

Well, a text message cannot contain a virus anyhow.

BTW a language question: Why "Virus Free" and not "virus
free"?

Also: Filters a limited. Who pays damages if that messages
is wrong?

pi



More information about the Bogofilter mailing list