BASE64 [was: various]

David Relson relson at osagesoftware.com
Wed Oct 23 01:00:48 CEST 2002


At 06:56 PM 10/22/02, Matthias Andree wrote:

> > Base64 parsing is a problem.  Not only will 'solitary' on a
> > line be ignored, but a legitimate b64 line, such as
> > 'c29saXRhcnkgd29yZAo=', will be tokenized as 'c29saxrhcnkgd29yzao'.
> > Unfortunately, 'xrhcnk' on a line by itself is also legal base64.
> > While this is permissible, I don't think it's common; I think we're
> > more likely to find base64 lines to be longer than the maximum token
> > length, except when they end in '='.  I haven't thought this through,
> > but I think we may get better results if the base64 re was modified to
> > only catch '='-terminated strings.
>
>That won't work, this termination only happens when "padding" is
>necessary, i. e. if the length is not divisible by 3, but leaves a
>remainder of 1 or 2.

I've got some code that will do the 4x6-bit to 3x8-bit conversion for 
base64.  However, there's a lack of handling for Content-xxx and related 
messages and I'm wondering whether I want to (should) tackle the task.





More information about the bogofilter-dev mailing list