Do mailers honor text/plain when the message is obviously HTML?

Tim Freeman tim at fungible.com
Sun Nov 2 17:55:20 CET 2003


From: Matthias Andree <matthias.andree at gmx.de>
>Fine. So we have his HTML junk as plain text tokens, where they usually
>don't show up. Train bogofilter with that mail and spammer begone.

Doesn't work.  Most of his HTML junk is garbage made-up tokens like
"</treadmill>" that will surely be completely different next time.  At
the moment, all of the leading HTML keywords like "doctype" and "body"
and "title" are presently non-spam signals in my BF database because I
have enough emails like this one I'm writing now that mention those
words and this email isn't spam.

tim at fungible.com (Tim Freeman) writes:
> Any opinions on what environment Bogofilter should assume here?

The score so far is that AOL and Hotmail treat HTML in text/plain as
HTML, and Yahoo treats it as plain text.  The big question is what
Outlook and other mainstream Windows mailers do, since that's probably
what spammers test against.

Well, this has only happened once, so I'll let it drop for a while.
If the spammer feels rewarded by it, it will happen many more times
and I'll see a bunch of similar false negatives.  Then it might be
time to take the path suggested in relson's email:

From: David Relson <relson at osagesoftware.com>
>You have the source code and can modify the lexer to recognize the
>DOCTYPE.  After that, you can run some tests to see what happens.  It
>would be interesting to have actual data, rather than our
>speculations.

-- 
Tim Freeman                                                  tim at fungible.com
GPG public key fingerprint ECDF 46F8 3B80 BB9E 575D  7180 76DF FE00 34B1 5C78 
Computers don't like it when you anthropomorphize them. -- Chris Phoenix




More information about the bogofilter-dev mailing list