nonconformant RFC-2047 (was: Re: bogofilter-SA-2004-01)

Pavel Kankovsky peak at argo.troja.mff.cuni.cz
Mon Nov 8 13:01:57 CET 2004


On Fri, 5 Nov 2004, Matthias Andree wrote:

> An encoded word as per RFC-2047 does not contain line feed characters,
> so we should not accept or attempt to decode them.

It depends on how popular MUAs interpret them. If they interpret them as
if they were ok, then spammers might abuse that misfeature to hide text
from Bogofilter. Rather than

   Subject: spammyword1 spammyword2 spammyword3

they could write:

   Subject: =?iso-8859-1?q?[CR]=12=34=56=...=ab=cd=ef?=

and Bf (using its standard lexer) would see no tokens. On the other hand,
there are other methods to hide spammy tokens (e.g. text interleaved with 
spaces, text send as an image), and some of them are already quite popular 
today.

--Pavel Kankovsky aka Peak  [ Boycott Microsoft--http://www.vcnet.com/bms ]
"Resistance is futile. Open your source code and prepare for assimilation."




More information about the bogofilter-dev mailing list