BASE64-encoded non-text MIME attachment being tokenized

Pavel Kankovsky peak at argo.troja.mff.cuni.cz
Mon Dec 2 11:12:17 CET 2013


On Wed, 30 Oct 2013, Matt Garretson wrote:

> I occasionally see binary attachments that bogofilter (1.2.4) will
> tokenize, resulting in tons of BASE64 string fragments in the token
> stream.  [...]
> Any ideas what is happening here?

I think it's the at sign (@) in the MIME boundary. Bogofilter uses the
following patterns to detect boundaries and they fail to match when
@ appears in the boundary:

BCHARSNOSPC	[[:alnum:]()+_,-./:=?#\']
BCHARS		[[:alnum:]()+_,-./:=?#\' ]
MIME_BOUNDARY	{BCHARS}*{BCHARSNOSPC}

A boundary containing @ is not RFC 2046-compliant but such nonstandard
strings appear in the wild (and there is already one nonstandard char,
namely #, recognized by Bogofilter).

Speaking of nonstandard characters appearing in MIME boundaries, I have 
seen {, } and % as well.

-- 
Pavel Kankovsky aka Peak                          / Jeremiah 9:21        \
"For death is come up into our MS Windows(tm)..." \ 21st century edition /






More information about the Bogofilter mailing list