obscured URL not being tokenized
Boris 'pi' Piwinger
3.14 at logic.univie.ac.at
Sat Dec 20 19:50:20 CET 2003
David Relson <relson at osagesoftware.com> wrote:
>> <a
>> href="http://%322%31.2%332.%316%30.1%305/%7a/s%69l%76e%72/f%61r%6d/i%6ed%65x.%68t%6dl"><img border="0"
>> src="http://%322%31.2%332.%316%30.1%305/%7a/s%69l%76e%72/f%61r%6d/e%6et.%6ap%67" width="500" height="300"></a>
>
>The %dd encodings are probably easy to deal with. I'll take a look at
>the code.
Hm, I entered that into my browser. AFAICS this is not valid
by any means. It will be OK to use such an encoding for path
and filename, but not for the hostname. I am not sure it is
worth the effort.
>Dealing with color is more challenging. The knowledge that "#FFFFFF"
>means "white" and "#000000" means "black" is relatively easy. More
>difficult is that "#FEFFFF", "#FFFEFF", "#FFFFFE", and "#FEFEFE" are
>(for all intents and purposes) the same as #FFFFFF. However, "#808080"
>is clearly different. To do the job "right" calls for recognizing
>colors and judging sameness -- not trivial.
>
>I make no promises, but perhaps one day ...
I think trying to do so would be really too much in the
sense of a statistical filter.
pi
More information about the Bogofilter
mailing list