Parsing stuff like %2E in URLs

David Relson relson at osagesoftware.com
Mon Jan 10 23:47:57 CET 2005


On Mon, 10 Jan 2005 14:44:03 -0500
Matt Garretson wrote:

> Hello, i've noticed that bogofilter (including 0.93.4) is parsing
> % escaped hex values in a way that is unexpected to me.  I'm not
> saying it's wrong; just that i'd been expecting different results. :)
> 
> A very simplified message is attached as an example, along with the
> bogolexer output i'm getting.  Basically "%2Estring" is being
> tokenized as "Estring".  However, "%40string" becomes just "string",
> as i'd expect.  (The difference appears to be whether the second
> hex digit is alpha or numeric.)
> 
> Is all this as expected?

Hi Matt,

Maybe yes, maybe no.  Is the url in a mime body part identified as
text/html?  If so, bogofilter's parser has a rule for handling %hex.  If
the url appears as part of plain text, then the normal parsing rules
apply and the percent sign and leading digits are ignored.

Add this line "Content-Type: text/html; charset=iso-8859-1" just after
the "From: " line to see the difference.

Now the question is, did the original specify text/html or text/plain???

Regards,

David



More information about the Bogofilter mailing list