Encoded filenames

Matthias Andree matthias.andree at gmx.de
Tue Nov 30 12:57:55 CET 2004


On Mon, 29 Nov 2004, David Relson wrote:

> You're going to love this.  In Evgeny's message, the filenames are
> base64 Windows-1251 encoded, i.e "=?Windows-1251?B?...", and are quite
> log.  Simply changing them to "filename.txt" (or anything else simple
> and short), changes the processing time for the message to 0.02 sec
> (from 12+ sec).

There be RFC-2047 bugs. lexer_v3.l must not call text_decode(), but it
does, and can hence recurse, leading to bogus results.

Anyways, it appears we're treating everything as text that isn't
text/html or message/*.

I haven't got the test case here so I can't try. How's this patch?
It's test neutral.

Index: src/lexer_v3.l
===================================================================
RCS file: /cvsroot/bogofilter/bogofilter/src/lexer_v3.l,v
retrieving revision 1.154
diff -u -r1.154 lexer_v3.l
--- src/lexer_v3.l	23 Nov 2004 04:28:02 -0000	1.154
+++ src/lexer_v3.l	30 Nov 2004 11:55:56 -0000
@@ -215,6 +215,7 @@
 %s TEXT HTML BOGO_LEX
 %s HTOKEN HDISCARD SCOMMENT LCOMMENT
 %s PGP_HEAD PGP_BODY
+%s SKIPPART
 
 %%
 
@@ -259,8 +260,12 @@
 						  clr_tag();
 						  switch (type) { 
 						  case MIME_TEXT_HTML:	BEGIN HTML; break;
+						  case MIME_TEXT_PLAIN:
+						  case MIME_TEXT:
+									BEGIN
+TEXT; break;
 						  case MIME_MESSAGE:	yy_set_state_initial(); break;
-						  default:		BEGIN TEXT; 
+						  default: BEGIN SKIPPART;
 						  }
 						  return EOH;
 						}

-- 
Matthias Andree



More information about the bogofilter-dev mailing list