Segfault in 0.15.3
David Relson
relson at osagesoftware.com
Thu Sep 11 14:56:22 CEST 2003
On Thu, 11 Sep 2003 14:34:57 +0200
Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> wrote:
> David Relson wrote:
>
> > Anyhow, the "X-Subject:" contains
> > "=?ISO-8859-1?Q?f=FC?=?iso-8859-1?Q?r_Website?=" which is _not_
> > valid according to RFC2047.
> >
> > I'll modify the code so it'll be more tolerant of illegal
> > constructs.
>
> There could be two ways how to handle it. Try to decode it
> anyway or just not decode things which are not well-formed.
> In this example (actually there is on = missing), I'd go for
> the latter solution.
>
> In things like space missing between MIME word and normal
> text (which we handle correctly by now) I actually found a
> reader which does procude it (so it is not only spam),
> namely some version of Outlook Express for Mac.
>
> pi
pi,
As they say, there's more than one way to skin a cat :-)
The basic plan with encoded text is to have lexer_v3.l match the
appropriate pattern and then pass the matching text to function
decode_text() to converted base64 and quoted-printable ('B' and 'Q')
chunks to their real value. Initially the lexer pattern specified the
allowed characters for the 'B' and 'Q' chunks. Bogofilter paid a
program size penalty of 300k (or so) for doing that. Now the character
check is done by in C by base64_validate() and qp_validate().
IF the lexer pattern is exactly right, then decode_text() will only get
properly formed encoded text chunks and the "pointer == NULL" checks
added today are not necessary.
I've yet to determine whether the lexer pattern should be changed so
that the C code only has to process correctly encoded chunks. Perhaps
I'll did deeper and perhaps I'll leave it as it is.
David
More information about the Bogofilter
mailing list