further text_decode() issues
matthias.andree at gmx.de
Fri Oct 8 22:42:06 EDT 2004
I have some more issues with text_decode:
1. why are we recursively decoding RFC-2047? Doesn't seem right to me.
(Cause: we're pushing the RFC-2047-decoded stuff back into our input
queue without blocking RFC-2047 recursion.)
echo 'header: =?US-ASCII?Q?=3D=3FUS-ASCII=3FQ=3F=3D3D=3D3FUS-ASCII=3D3FQ=3D3Ftest=3D3F=3D3D=3F=3D?=' | bogolexer
Yes I know the header word isn't RFC-2047 conformant because of its
size, bogofilter doesn't care.
2. the parser is apparently not robust, it makes assumptions that the
encoded word is well-formatted. What if it isn't? I haven't yet
managed to break it after the fix but I haven't tried for long.
Maybe the lexer_v3 helps avoiding the bugs the code still has.
If someone can come up with a test case that breaks bogolexer's
RFC-2047 decoder in bogofilter's current CVS version (trunk version,
not txn branch!), please let me know.
3. we should probably unfold the input header lines before letting the
lexer treat them. We could then also remove the RFC-2047 rules from
lexer_v3.l. (I cannot do this before Sunday.)
Encrypted mail welcome: my GnuPG key ID is 0x052E7D95 (PGP/MIME preferred)
More information about the Bogofilter-dev