filter evasion
Stefan Mashkevich
mash at mashke.org
Thu Nov 6 23:47:12 CET 2003
On Thu, 6 Nov 2003, John McCain wrote:
> I'm seeing messages slip through that have the following characteristics:
>
> sp</ham>am
>
> Which tokenize as:
> sp
> ham
> am
Same here. I also see 'sp<!ham>am' -- didn't even know one could write
HTML comments that way -- but that does already tokenize properly ('spam').
In fact, in the last week or so there has been quite an invasion (of my
address, at least) with messages like this, their style clearly indicating
that they share a common origin. Training seems to help only so much, as
after about a dozen have been sent home, new ones are still arriving.
A small piece of one looks like this:
<br><font color=white>you evidently thought me mad. Sir, you should never
judge lightly
<br><font color=black size=1>Rem<!TFbs>oval Inform<!WzE>ation o<!WFLE>n
Si<!mNF>te</font>
<br><font color=white>time it revolves, one can in imagination follow the
flow of that</font>
> What can be done about this? I personally don't think it would be a great
> loss to simply ingore all html closing tags. I can't think of any other HTML
> evil which could be perpetrated to do this any other sort of way without
> seriously disrupting the text.
There are those pieces of innocent text inside <font color=white> -- which
is not too stupid indeed. When you view the text as HTML, not one piece of
noise is actually visible.
Anyone else seen such messages and/or fought them successfully?
Stefan
More information about the Bogofilter
mailing list