multipart spam

Chris Fortune cfortune at telus.net
Sun Nov 14 10:10:56 CET 2004


>Somebody has probably mentioned this already, but there seems to be a growing trend in "hard to classify" spam lately: MIME emails
>with two multipart sections: text and html.
---We had this discussion already many times. Does not seem to be true.

But it is true.  I could post several hundred samples.


>The payload of the mail is in the HTML section (consisting of images and urls), but the
>text section is filled with either conversational text taken from books, etc, or -even worse- authentic "ham" e-mails,
---What is that? The spammer cannot know what ham means for me.

That is untrue for a site-wide (shared) wordlist.


>obviously sampled from somebody's sent folder.
---So this is not known to my filter.

The ham collected from a group of people is self-similar, like the vocabulary of a language.   A shared wordlist is filled with the
shared "languages" of ham and spam.  The group wordlist is surprisingly similar to an individual's wordlist, except that there are
many more tokens, and tokens that would be strong ham or spam indicators in a personal wordlist are "watered down".


>The result is a very low bogosity score.
---Actually, I never see those problems. It might be that your training was somehow insufficient.

You probably don't see those problems because your bogofilter is personal to your ham and your spam.



>Other than registering each and every one of these
>mails, then retraining the wordlist, any suggestions?
---Training to exhaustion:-))

That's the problem.   I am training bogofilter to recognize hammy tokens as spam, then training it again to do the opposite, several
times a day.



>I guess this could also be the beginning of a thread about de-obsfucation.
---What should that do?

Reformat the email so that only the parts intended to be displayed to the recipient are included, for example.  The resulting email
would then be used for classification.




---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.796 / Virus Database: 540 - Release Date: 11/13/2004




More information about the Bogofilter mailing list