obscured URL not being tokenized

David Relson relson at osagesoftware.com
Sat Dec 20 20:06:54 CET 2003


On Sat, 20 Dec 2003 19:50:20 +0100
Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> wrote:

> David Relson <relson at osagesoftware.com> wrote:
> 
> >> <a 
> >>
> >href="http://%322%31.2%332.%316%30.1%305/%7a/s%69l%76e%72/f%61r%6d/i
> >%6ed%65x.%68t%6dl"><img border="0" >
> >src="http://%322%31.2%332.%316%30.1%305/%7a/s%69l%76e%72/f%61r%6d/e%
> >6et.%6ap%67" width="500" height="300"></a>
> >
> >The %dd encodings are probably easy to deal with.  I'll take a look
> >at the code.
> 
> Hm, I entered that into my browser. AFAICS this is not valid
> by any means. It will be OK to use such an encoding for path
> and filename, but not for the hostname. I am not sure it is
> worth the effort.
> 
> >Dealing with color is more challenging.  The knowledge that "#FFFFFF"
> >means "white" and "#000000" means "black" is relatively easy.  More
> >difficult is that "#FEFFFF", "#FFFEFF", "#FFFFFE", and "#FEFEFE" are
> >(for all intents and purposes) the same as #FFFFFF.  However,
> >"#808080" is clearly different.  To do the job "right" calls for
> >recognizing colors and  judging sameness -- not trivial.
> >
> >I make no promises, but perhaps one day ...
> 
> I think trying to do so would be really too much in the
> sense of a statistical filter.
> 
> pi

pi,

As mentionned in a previous message, color info appears in a number of
tags.  Examples include:

	<body bgcolor="white" text="black">
	<font color="#000000">
	<TABLE BGCOLOR="#cccccc">
	<td="#9EBAC6">
	<td bgcolor=lightblue>

Likely, the most value for the least effort would come from recognizing
000000 as black and FFFFFF as white.

Question:  what other tags allow color info?

David





More information about the Bogofilter mailing list