filter evasion

Stefan Mashkevich mash at mashke.org
Thu Nov 6 23:47:12 CET 2003


On Thu, 6 Nov 2003, John McCain wrote:

> I'm seeing messages slip through that have the following characteristics:
> 
> sp</ham>am
> 
> Which tokenize as:
> sp
> ham
> am

Same here. I also see 'sp<!ham>am' -- didn't even know one could write
HTML comments that way -- but that does already tokenize properly ('spam').

In fact, in the last week or so there has been quite an invasion (of my 
address, at least) with messages like this, their style clearly indicating 
that they share a common origin. Training seems to help only so much, as 
after about a dozen have been sent home, new ones are still arriving.
A small piece of one looks like this:

<br><font color=white>you evidently thought me mad.  Sir, you should never 
judge lightly
<br><font color=black size=1>Rem<!TFbs>oval Inform<!WzE>ation o<!WFLE>n
Si<!mNF>te</font>
<br><font color=white>time it revolves, one can in imagination follow the 
flow of that</font>

> What can be done about this?  I personally don't think it would be a great 
> loss to simply ingore all html closing tags.  I can't think of any other HTML 
> evil which could be perpetrated to do this any other sort of way without 
> seriously disrupting the text.

There are those pieces of innocent text inside <font color=white> -- which 
is not too stupid indeed. When you view the text as HTML, not one piece of 
noise is actually visible.

Anyone else seen such messages and/or fought them successfully?

                                                     Stefan





More information about the Bogofilter mailing list