spaced out spam words

Tony L. Svanstrom tony at svanstrom.com
Sat Jun 10 12:05:12 CEST 2006


On Fri, 9 Jun 2006 the voices made David Relson write:

DR> HTML is a complex issue.  There are lots of tricks possible, for
DR> example bogus tags and putting single letters in each cell of a
DR> table.  HTML also allows "camouflaged" text (think white on white)
DR> that a human won't see but a computer program will.  I'm unaware of
DR> algorithms for successfully dealing with camo.

 Considering that you'd have to be as close as possible to 100% compatible with
the current webbrowsers out there (or there'll be hacks to get around the
filter), you probably would have to use a webbrowser to render the page, and
then use OCR to compare the resulting image/page with the source (to find if
there's hidden text, and probably using the source to improve the OCRs
accuracy).

 But, of course, that you could get around simply by using javascript to set
the webpage to the size of the screen and then use a :hover on the main bodytag
to show/hide material.


And people wonder why I block HTML-emails from most of my emailaccounts... =/


	/Tony
-- 
        /\___/\                                          /\___/\
        \_@ @_/                                          \_@ @_/
   .--oOO-(_)-OOo--------------------------------------oOO-(_)-OOo--.
   |  perl -e'print$_{$_} for sort%_=`lynx -dump svanstrom.com/t`'  |
   `---ôôô---ôôô----------------------------------------ôôô---ôôô---´
       \O/   \O/        ©1998-2005 svanstrom.com        \O/   \O/




More information about the Bogofilter mailing list