RFC-2047 & encoded QP text

Matthias Andree matthias.andree at gmx.de
Mon Jul 28 09:40:29 CEST 2003


David Relson <relson at osagesoftware.com> writes:

> Matthias,
>
> The "\?" is not only standards compliant but necessary.  Junior has
> email that kills bogofilter without it because the lexer keeps on
> reading an encoded attachment and runs out of buffer space.  With the
> "\?", the lexer works much better.

> Regarding spaces, if necessary we could try something like "[^\t\r\n\?]"
> and see what happens.  Let's wait until there's a clear need.

Well, I've looked at the situation, might the mail contain a broken
RFC2047 encoded word, one that ends in "?" or "=" rather than "?="?

An encoded word does not continue past the end of the line, this must be
accounted for. We will also need to take care that we remove linear
white space between two encoded words, so that:

Summary: =?ISO-8859-1?Q?Regen?=
  =?ISO-8859-1?Q?w=FCrmer=3F?=

yields {Summary; Regenwürmer} rather than {Summary; Regen; würmer}.

This is necessary so spammers can't split up their tokens at will to
hide them from bogofilter's view.

I think we'll have to move the RFC-2047 decoding out of the lexer.

-- 
Matthias Andree




More information about the bogofilter-dev mailing list