BASE64 [was: various]

Matthias Andree matthias.andree at gmx.de
Wed Oct 23 01:14:41 CEST 2002


On Tue, 22 Oct 2002, David Relson wrote:

> >That won't work, this termination only happens when "padding" is
> >necessary, i. e. if the length is not divisible by 3, but leaves a
> >remainder of 1 or 2.
> 
> I've got some code that will do the 4x6-bit to 3x8-bit conversion for 
> base64.  However, there's a lack of handling for Content-xxx and related 
> messages and I'm wondering whether I want to (should) tackle the task.

I've spent some little thoughts on whether lex would be smart enough to
parse the MIME structure while still emitting delimiters as tokens, and
on-the-fly decoding of base64 and qp. I believe that
Content-Transfer-Encoding should be taken into account because there are
words (particularly with 4, 8, 12 characters) that make up valid base64
words; OTOH, we should decide if we're actually interested in more than
the headers of MIME parts that have Content-Type other than text/*. I
believe we should only parse text/*, convert things to UTF-8 (which will
bring another dependency, iconv, jconv, whatever), and only read headers
of other content types. As someone (I believe Boris) has pointed out,
spammers won't use base64 to deceive spam traps because that will render
the mail unreadable.

But, was written above, I'm not sure if we can hook this deeply into
flex or if it takes another layer (bison/yacc) on top of this. Or if we
would need ANTLR, OpenZz or some other scanner generator. I don't have
enough expertise with flex so far.

-- 
Matthias Andree




More information about the bogofilter-dev mailing list