Encoded filenames
Matthias Andree
matthias.andree at gmx.de
Tue Nov 30 12:57:55 CET 2004
On Mon, 29 Nov 2004, David Relson wrote:
> You're going to love this. In Evgeny's message, the filenames are
> base64 Windows-1251 encoded, i.e "=?Windows-1251?B?...", and are quite
> log. Simply changing them to "filename.txt" (or anything else simple
> and short), changes the processing time for the message to 0.02 sec
> (from 12+ sec).
There be RFC-2047 bugs. lexer_v3.l must not call text_decode(), but it
does, and can hence recurse, leading to bogus results.
Anyways, it appears we're treating everything as text that isn't
text/html or message/*.
I haven't got the test case here so I can't try. How's this patch?
It's test neutral.
Index: src/lexer_v3.l
===================================================================
RCS file: /cvsroot/bogofilter/bogofilter/src/lexer_v3.l,v
retrieving revision 1.154
diff -u -r1.154 lexer_v3.l
--- src/lexer_v3.l 23 Nov 2004 04:28:02 -0000 1.154
+++ src/lexer_v3.l 30 Nov 2004 11:55:56 -0000
@@ -215,6 +215,7 @@
%s TEXT HTML BOGO_LEX
%s HTOKEN HDISCARD SCOMMENT LCOMMENT
%s PGP_HEAD PGP_BODY
+%s SKIPPART
%%
@@ -259,8 +260,12 @@
clr_tag();
switch (type) {
case MIME_TEXT_HTML: BEGIN HTML; break;
+ case MIME_TEXT_PLAIN:
+ case MIME_TEXT:
+ BEGIN
+TEXT; break;
case MIME_MESSAGE: yy_set_state_initial(); break;
- default: BEGIN TEXT;
+ default: BEGIN SKIPPART;
}
return EOH;
}
--
Matthias Andree
More information about the bogofilter-dev
mailing list