obscured URL not being tokenized
David Relson
relson at osagesoftware.com
Sat Dec 20 20:06:54 CET 2003
On Sat, 20 Dec 2003 19:50:20 +0100
Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> wrote:
> David Relson <relson at osagesoftware.com> wrote:
>
> >> <a
> >>
> >href="http://%322%31.2%332.%316%30.1%305/%7a/s%69l%76e%72/f%61r%6d/i
> >%6ed%65x.%68t%6dl"><img border="0" >
> >src="http://%322%31.2%332.%316%30.1%305/%7a/s%69l%76e%72/f%61r%6d/e%
> >6et.%6ap%67" width="500" height="300"></a>
> >
> >The %dd encodings are probably easy to deal with. I'll take a look
> >at the code.
>
> Hm, I entered that into my browser. AFAICS this is not valid
> by any means. It will be OK to use such an encoding for path
> and filename, but not for the hostname. I am not sure it is
> worth the effort.
>
> >Dealing with color is more challenging. The knowledge that "#FFFFFF"
> >means "white" and "#000000" means "black" is relatively easy. More
> >difficult is that "#FEFFFF", "#FFFEFF", "#FFFFFE", and "#FEFEFE" are
> >(for all intents and purposes) the same as #FFFFFF. However,
> >"#808080" is clearly different. To do the job "right" calls for
> >recognizing colors and judging sameness -- not trivial.
> >
> >I make no promises, but perhaps one day ...
>
> I think trying to do so would be really too much in the
> sense of a statistical filter.
>
> pi
pi,
As mentionned in a previous message, color info appears in a number of
tags. Examples include:
<body bgcolor="white" text="black">
<font color="#000000">
<TABLE BGCOLOR="#cccccc">
<td="#9EBAC6">
<td bgcolor=lightblue>
Likely, the most value for the least effort would come from recognizing
000000 as black and FFFFFF as white.
Question: what other tags allow color info?
David
More information about the Bogofilter
mailing list