Segfault in 0.15.3

Thu Sep 11 14:56:22 CEST 2003

On Thu, 11 Sep 2003 14:34:57 +0200
Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> wrote:

> David Relson wrote:
> 
> > Anyhow, the "X-Subject:" contains
> > "=?ISO-8859-1?Q?f=FC?=?iso-8859-1?Q?r_Website?=" which is _not_
> > valid according to RFC2047.
> > 
> > I'll modify the code so it'll be more tolerant of illegal
> > constructs.
> 
> There could be two ways how to handle it. Try to decode it
> anyway or just not decode things which are not well-formed.
> In this example (actually there is on = missing), I'd go for
> the latter solution.
> 
> In things like space missing between MIME word and normal
> text (which we handle correctly by now) I actually found a
> reader which does procude it (so it is not only spam),
> namely some version of Outlook Express for Mac.
> 
> pi

pi,

As they say, there's more than one way to skin a cat :-)

The basic plan with encoded text is to have lexer_v3.l match the
appropriate pattern and then pass the matching text to function
decode_text() to converted base64 and quoted-printable ('B' and 'Q')
chunks to their real value.  Initially the lexer pattern specified the
allowed characters for the 'B' and 'Q' chunks.  Bogofilter paid a
program size penalty of 300k (or so) for doing that.  Now the character
check is done by in C by base64_validate() and qp_validate().  

IF the lexer pattern is exactly right, then decode_text() will only get
properly formed encoded text chunks and the "pointer == NULL" checks
added today are not necessary.  

I've yet to determine whether the lexer pattern should be changed so
that the C code only has to process correctly encoded chunks.  Perhaps
I'll did deeper and perhaps I'll leave it as it is.

David