segfault on rfc2047-like subject

Matthias Andree matthias.andree at gmx.de
Sat Oct 9 00:07:06 CEST 2004


Matthias Andree <matthias.andree at gmx.de> writes:

> David, I'm on the bug.

I have found several nits that let the whole system tumble:

1. inconsistent generation of word_t:

   a. several parts of the code stuff a NUL byte at the end of the word,
      for instance base64_decode and qp_decode

   b. some parts of the code generate a word_t that has a "tight fit"
      with no room to append a NUL, for instance yy_text.

This will need to be resolved before 1.0.

2. text_decode somehow (not yet sure) manages to leave NUL bytes in the
   word it feeds back:

Breakpoint 1, text_decode (w=0x807f170) at ../../src/lexer.c:292
292         byte *beg = w->text;
(gdb) finish
Run till exit from #0  text_decode (w=0x807f170) at ../../src/lexer.c:292
yylex () at ../../src/lexer_v3.c:2511
2511            yy_c_buf_p = yy_cp;
Value returned is $1 = 103
(gdb) print *((word_t *)0x807f170)
$2 = {leng = 224, 
  text = 0x80a5bb0 "Re: [Broken] =?ISO-8859-1?Q?=5BBroken=5DBlah=20Foo=E4=20Bar=20Blah"}
(gdb) print *((word_t *)0x807f170)->text at 103
$3 = "Re: [Broken] =?ISO-8859-1?Q?=5BBroken=5DBlah=20Foo=E4=20Bar=20Blah\000  Foo=3D28=5F?= Bar__?= t\344BlahFoo\344t)"

3. text_decode operates on C strings (str* functions) while we have a
   (base,length)-tuple that describes our strings. This inconsistence
   causes that flex passes in an encoded word with embedded NUL that
   text_decode cannot handle.

I'm not yet sure if 2 (which looks like a fencepost error at this time)
or 3 is the real bug. I'd think str*() function calls do not belong into
functions that deal with word_t.

As a band-aid fix for this particular code, this will work, but the bugs
mentioned above need to be fixed, too:

Index: src/lexer_v3.l
===================================================================
RCS file: /cvsroot/bogofilter/bogofilter/src/lexer_v3.l,v
retrieving revision 1.152
diff -u -r1.152 lexer_v3.l
--- src/lexer_v3.l	8 Oct 2004 21:22:25 -0000	1.152
+++ src/lexer_v3.l	8 Oct 2004 22:03:13 -0000
@@ -173,7 +173,7 @@
 HTML_ENCODING	"&#"x?[[:xdigit:]]+";"
 URL_ENCODING	"%"[[:xdigit:]][[:xdigit:]]
 
-ENCODED_WORD	=\?{CHARSET}\?[bq]\?[^?]*\?=
+ENCODED_WORD	=\?{CHARSET}\?[bq]\?[^?\0]*\?=
 ENCODED_TOKEN	({TOKENFRONT}{TOKENMID})?({ENCODED_WORD}{WHITESPACE}+)*{ENCODED_WORD}
 
 /*


-- 
Matthias Andree

Encrypted mail welcome: my GnuPG key ID is 0x052E7D95 (PGP/MIME preferred)



More information about the bogofilter-dev mailing list