unfolding header lines

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Thu Sep 4 09:36:44 CEST 2003


David Relson <relson at osagesoftware.com> wrote:

>The initial implementation is done by C function get_unfolded_line()
>which does a bit of pre-reading of text to identify folded lines.  The
>code converts the newlines encountered to spaces.  

This is wrong. The newline must simply be deleted. Not that
it matters in our case;-) Actually, we could collapse all
multiple whitespace (blank, tab, newline) to one blank.

>It all works great -
>until the folded line far exceeds the prescribed max line length
>(RFC-2822, 998 characters).  

As someone stated, this is the limit for a line which is
transmitted. I don't recall if there is any real limit.

>When the input buffer gets close to full
>(over 8k), the function returns and the remainder of the folded line
>isn't tagged.

If we limit to 8k, that will probably be good enough. But as
I suggested, if we don't want to do that, we can simply
split those long lines into multiple lines.

>It has been suggested that the flex grammar, i.e. lexer_v3.l, is the
>right place to handle the unfolding. 

I don't think so. And decoding work could be done before
that, which is more reasonable, the lexer should just read
the message as you and I do in our readers.

>At the end of every message header and mime body part header
>is an empty line.  Using the C code, the pattern for the line is "^[ \t]*$".  

Hm, there must not be any whitespace in that line. Was that
found in the wild?

pi




More information about the bogofilter-dev mailing list