[cvs] bogofilter/src lexer_v3.l,1.158,1.159

Matthias Andree matthias.andree at gmx.de
Sun Jun 26 10:50:41 CEST 2005


David Relson <relson at users.sourceforge.net> writes:

> Update of /cvsroot/bogofilter/bogofilter/src
> In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv26706
>
> Modified Files:
> 	lexer_v3.l 
> Log Message:
> Fix multiple decoding of encoded tokens.
>
> Index: lexer_v3.l
> ===================================================================
> RCS file: /cvsroot/bogofilter/bogofilter/src/lexer_v3.l,v
> retrieving revision 1.158
> retrieving revision 1.159
> diff -u -d -r1.158 -r1.159
> --- lexer_v3.l	25 Jun 2005 16:50:38 -0000	1.158
> +++ lexer_v3.l	25 Jun 2005 23:21:06 -0000	1.159
> @@ -227,9 +227,13 @@
>  <BOGO_LEX>^\"{BOGOLEX_TOKEN}\"{NUM_NUM}$	{ return BOGO_LEX_LINE; }
>  <BOGO_LEX>\n					{ lineno += 1; }
>  
> -<INITIAL>{ENCODED_TOKEN}			{ word_t *raw = yy_text();
> -						  word_t *dec = text_decode(raw);
> -						  yy_unput(dec->text, dec->leng);
> +<INITIAL>{ENCODED_TOKEN}			{ static int processed = 0;
> +						  if (processed == lineno) {
> +						      word_t *raw = yy_text();
> +						      word_t *dec = text_decode(raw);
> +						      yy_unput(dec->text, dec->leng);
> +						      processed = lineno;
> +						  }
>  						}

There can be more than one encoded word on the same line, so this skips
the 2nd and all subsequent words. I have seen these in the wild in
solicited mail, and such behavior is actually recommended by the
standards, i. e. if I have a longish Subject with say six words with
umlauts in the first and last word, the recommended encoding is to
encode the first and last word and leve the four words in between
unencoded and outside the "encoded-word" syntax elements.

We'll need to find an approach that either tracks the position with
character accuracy or, preferably, one that works without yy_unput.
probably by moving RFC-2047 decoding out of the lexer into the MIME
decoder, close to header unfolding.

-- 
Matthias Andree



More information about the bogofilter-dev mailing list