segfault on rfc2047-like subject
Matthias Andree
matthias.andree at gmx.de
Sat Oct 9 00:07:06 CEST 2004
Matthias Andree <matthias.andree at gmx.de> writes:
> David, I'm on the bug.
I have found several nits that let the whole system tumble:
1. inconsistent generation of word_t:
a. several parts of the code stuff a NUL byte at the end of the word,
for instance base64_decode and qp_decode
b. some parts of the code generate a word_t that has a "tight fit"
with no room to append a NUL, for instance yy_text.
This will need to be resolved before 1.0.
2. text_decode somehow (not yet sure) manages to leave NUL bytes in the
word it feeds back:
Breakpoint 1, text_decode (w=0x807f170) at ../../src/lexer.c:292
292 byte *beg = w->text;
(gdb) finish
Run till exit from #0 text_decode (w=0x807f170) at ../../src/lexer.c:292
yylex () at ../../src/lexer_v3.c:2511
2511 yy_c_buf_p = yy_cp;
Value returned is $1 = 103
(gdb) print *((word_t *)0x807f170)
$2 = {leng = 224,
text = 0x80a5bb0 "Re: [Broken] =?ISO-8859-1?Q?=5BBroken=5DBlah=20Foo=E4=20Bar=20Blah"}
(gdb) print *((word_t *)0x807f170)->text at 103
$3 = "Re: [Broken] =?ISO-8859-1?Q?=5BBroken=5DBlah=20Foo=E4=20Bar=20Blah\000 Foo=3D28=5F?= Bar__?= t\344BlahFoo\344t)"
3. text_decode operates on C strings (str* functions) while we have a
(base,length)-tuple that describes our strings. This inconsistence
causes that flex passes in an encoded word with embedded NUL that
text_decode cannot handle.
I'm not yet sure if 2 (which looks like a fencepost error at this time)
or 3 is the real bug. I'd think str*() function calls do not belong into
functions that deal with word_t.
As a band-aid fix for this particular code, this will work, but the bugs
mentioned above need to be fixed, too:
Index: src/lexer_v3.l
===================================================================
RCS file: /cvsroot/bogofilter/bogofilter/src/lexer_v3.l,v
retrieving revision 1.152
diff -u -r1.152 lexer_v3.l
--- src/lexer_v3.l 8 Oct 2004 21:22:25 -0000 1.152
+++ src/lexer_v3.l 8 Oct 2004 22:03:13 -0000
@@ -173,7 +173,7 @@
HTML_ENCODING "&#"x?[[:xdigit:]]+";"
URL_ENCODING "%"[[:xdigit:]][[:xdigit:]]
-ENCODED_WORD =\?{CHARSET}\?[bq]\?[^?]*\?=
+ENCODED_WORD =\?{CHARSET}\?[bq]\?[^?\0]*\?=
ENCODED_TOKEN ({TOKENFRONT}{TOKENMID})?({ENCODED_WORD}{WHITESPACE}+)*{ENCODED_WORD}
/*
--
Matthias Andree
Encrypted mail welcome: my GnuPG key ID is 0x052E7D95 (PGP/MIME preferred)
More information about the bogofilter-dev
mailing list