multipart spam
Boris 'pi' Piwinger
3.14 at piology.org
Sun Nov 14 10:20:04 CET 2004
"Chris Fortune" <cfortune at telus.net> wrote:
>>Somebody has probably mentioned this already, but there seems to be a growing trend in "hard to classify" spam lately: MIME emails
>>with two multipart sections: text and html.
>---We had this discussion already many times. Does not seem to be true.
>
>But it is true. I could post several hundred samples.
Some people had those a year ago. Not really a new trend.
>>The payload of the mail is in the HTML section (consisting of images and urls), but the
>>text section is filled with either conversational text taken from books, etc, or -even worse- authentic "ham" e-mails,
>---What is that? The spammer cannot know what ham means for me.
>
>That is untrue for a site-wide (shared) wordlist.
You mean some spam can access the mailboxes?
>>obviously sampled from somebody's sent folder.
>---So this is not known to my filter.
>
>The ham collected from a group of people is self-similar, like the vocabulary of a language.
There might be some problem. This seems to be a general
limitation to site-wide filters. On the other hand this
should generate more false positives.
>>Other than registering each and every one of these
>>mails, then retraining the wordlist, any suggestions?
>---Training to exhaustion:-))
>
>That's the problem. I am training bogofilter to recognize hammy tokens as spam, then training it again to do the opposite, several
>times a day.
Obviously, I also receive this kind of spam. In practice it
is not a problem. Actually, those words spammers use there
just become unimportant and the filter learns what really
are good and bad words.
>>I guess this could also be the beginning of a thread about de-obsfucation.
>---What should that do?
>
>Reformat the email so that only the parts intended to be displayed to the recipient are included, for example. The resulting email
>would then be used for classification.
I believe someone had something like that long ago on this
list. To some extend bogofilter does that.
>Outgoing mail is certified Virus Free.
Well, a text message cannot contain a virus anyhow.
BTW a language question: Why "Virus Free" and not "virus
free"?
Also: Filters a limited. Who pays damages if that messages
is wrong?
pi
More information about the Bogofilter
mailing list