Encoded filenames

David Relson relson at osagesoftware.com
Wed Dec 1 13:12:57 CET 2004


On Wed, 01 Dec 2004 11:57:17 +0100
Matthias Andree wrote:

> David Relson <relson at osagesoftware.com> writes:
> 
> > As to RFC-2047, _something_ is needed to decode encoded words like
> > =charset?Q?...?  The current use of text_decode() for this looks
> > fine to me.  I don't see the recursion you've mentioned.   Can you
> > send me a test case?
> 
> I don't have one off-hand.
> 
> The problem is:
> 
> 1. lexer sees an encoded word
> 
> 2. lexer uses text_decode and prepends its result to the input - which
>    may again be an encoded word that was encoded to "escape" it (Gnus
>    does that, for instance, and it's the right thing to do)
> 3. lexer passes it through text_decode again for the same input
>    this is a recursion and hence the bug

This is not recursion.  Recursion would involve text_decode() being
called from within text_decode().  We have the possibility of a
second (improper) call.  

Here's a test case for nested strings:

cat <<EOF | bogolexer -p -xml -vvvv
Subject: =?charset?q?string1?=
Subject: =?charset1?q?=3D=3Fcharset2=3FQ=3Fstring2=3D=3F?=
EOF

> I have no idea how to make this work EFFECTIVELY within the lexer,
> hence I postulate that text_decode must not be called from the lexer
> but earlier, in yyinput() at the latest.

Calling text_decode earlier requires matching the RFC-2047 pattern.  As
pattern matching is a lexer responsibility, doing it earlier is wrong.
If it _really_ mattered (which I doubt), a lexer mode (flag) could be
used.  Setting it upon calling text_decode() and clearing it upon seeing
a new header line might work.



More information about the bogofilter-dev mailing list