unfolding header lines

Thu Sep 4 05:10:19 CEST 2003

David Relson <relson at osagesoftware.com> writes:
> Greetings all,
> 
> As you all know, bogofilter now (as of 0.15.1) knows about unfolding
> header lines.  This is useful as it allows tagging of all the tokens of
> a multi-line To:, Subject:, From:, or Return-Path: header line.

[...]  Using the C code, the pattern for the line is "^[
> \t]*$".  When the unfolding work shifts into lexer_v3.l, the pattern
> becomes "\n[ \t]*\n" and this causes trouble.  The lexer is in header
> mode as it reads the empty line and as it pre-reads the line _after_
> that.  Being in header mode, base64 and qp decoding don't get applied. 
> End of story :-(

No, you've got the unfolding regex wrong. It's not '^[ \t]*', it's 
'^[ \t]+'.  I.e. one or more, not zero or more.

And one or more doesn't match an empty line. (at least, it shouldn't.
I didn't think the RFC allowed whitespace on an empty line?)

So just add

<INTIAL>\n[ \t]         ;  /* unfold lines */

to the end of the INITAL section in the lexer. Or am I mis-understanding
the problem?? (note that you don't need either a + or a * after the
[] as the lexer will eat any additional whitespace normally).

Michael.