obscured URL not being tokenized

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Sat Dec 20 19:50:20 CET 2003


David Relson <relson at osagesoftware.com> wrote:

>> <a 
>> href="http://%322%31.2%332.%316%30.1%305/%7a/s%69l%76e%72/f%61r%6d/i%6ed%65x.%68t%6dl"><img border="0" 
>> src="http://%322%31.2%332.%316%30.1%305/%7a/s%69l%76e%72/f%61r%6d/e%6et.%6ap%67" width="500" height="300"></a>
>
>The %dd encodings are probably easy to deal with.  I'll take a look at
>the code.

Hm, I entered that into my browser. AFAICS this is not valid
by any means. It will be OK to use such an encoding for path
and filename, but not for the hostname. I am not sure it is
worth the effort.

>Dealing with color is more challenging.  The knowledge that "#FFFFFF"
>means "white" and "#000000" means "black" is relatively easy.  More
>difficult is that "#FEFFFF", "#FFFEFF", "#FFFFFE", and "#FEFEFE" are
>(for all intents and purposes) the same as #FFFFFF.  However, "#808080"
>is clearly different.  To do the job "right" calls for recognizing
>colors and  judging sameness -- not trivial.
>
>I make no promises, but perhaps one day ...

I think trying to do so would be really too much in the
sense of a statistical filter.

pi




More information about the Bogofilter mailing list