Encoded filenames
David Relson
relson at osagesoftware.com
Wed Dec 1 13:12:57 CET 2004
On Wed, 01 Dec 2004 11:57:17 +0100
Matthias Andree wrote:
> David Relson <relson at osagesoftware.com> writes:
>
> > As to RFC-2047, _something_ is needed to decode encoded words like
> > =charset?Q?...? The current use of text_decode() for this looks
> > fine to me. I don't see the recursion you've mentioned. Can you
> > send me a test case?
>
> I don't have one off-hand.
>
> The problem is:
>
> 1. lexer sees an encoded word
>
> 2. lexer uses text_decode and prepends its result to the input - which
> may again be an encoded word that was encoded to "escape" it (Gnus
> does that, for instance, and it's the right thing to do)
> 3. lexer passes it through text_decode again for the same input
> this is a recursion and hence the bug
This is not recursion. Recursion would involve text_decode() being
called from within text_decode(). We have the possibility of a
second (improper) call.
Here's a test case for nested strings:
cat <<EOF | bogolexer -p -xml -vvvv
Subject: =?charset?q?string1?=
Subject: =?charset1?q?=3D=3Fcharset2=3FQ=3Fstring2=3D=3F?=
EOF
> I have no idea how to make this work EFFECTIVELY within the lexer,
> hence I postulate that text_decode must not be called from the lexer
> but earlier, in yyinput() at the latest.
Calling text_decode earlier requires matching the RFC-2047 pattern. As
pattern matching is a lexer responsibility, doing it earlier is wrong.
If it _really_ mattered (which I doubt), a lexer mode (flag) could be
used. Setting it upon calling text_decode() and clearing it upon seeing
a new header line might work.
More information about the bogofilter-dev
mailing list