Encoded headers parser

David Relson relson at osagesoftware.com
Sat Jul 26 22:26:51 CEST 2003


At 03:39 PM 7/26/03, Junior wrote:
>On Fri, Jul 25, 2003 at 03:59:13PM -0400, David Relson wrote:
>| Junior,
>|
>| Currently, bogofilter has the following in lexer_v3.l:
>|
>| BASE64          [0-9a-zA-Z/+=]+
>| QP              [^[:blank:]]+
>| ENCODED_TOKEN   {BOGOLEX_TOKEN}*=\?{ID}\?(b\?{BASE64}|q\?{QP})\?\=
>|
>| It's more specific than what you're running and should do better.  Let me
>| know how it goes.
>
>It seems to be correct, but the <INITIAL>{ENCODED_TOKEN} in the lexer
>*is* running in the message body, I compiled with flex -d. I don't know
>much these things, but the problem maybe in the logic, and not in the
>rules.

I'll check.  The rules and prefixes probably are not exactly right.

>The chunk of email, to you verify:
>
>         ------=_NextPart_000_0005_01C20B05.A109EF00
>         Content-Type: application/msword;
>                         name="=?iso-8859-1?Q?RESOLU=C7=C3O-P1-CCO06.doc?="
>         Content-Transfer-Encoding: base64
>         Content-Disposition: attachment;
> 
>filename="=?iso-8859-1?Q?RESOLU=C7=C3O-P1-CCO06.doc?="
>
> 
>0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/CQAGAAAAAAAAAAAAAAABAAAAYgAAAAAAAAAA
> 
>EAAAZAAAAAEAAAD+////AAAAAGEAAAD/////////////////////////////////////////////
> 
>////////////////////////////////////////////////////////////////////////////
>
>When the mime is smaller, bogofilter continues to parse correctly, but
>when it is bigger, it crashes. I suspect that the problem is in the line
>of the filename, it tries to decode and goes with the mime too. If I
>put a space before the last ?=, it works ok!

Adding the space causes the lexer to not use the pattern, so it doesn't 
qualify as a fix :-(


>The "mailer" used to sent this email (I suspect that this is widely used :)
>
>         X-Mailer: Microsoft Outlook Express 5.00.2919.6700
>         X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700
>         MIME-Version: 1.0
>
>Hope this help. And good luck with this great work, supporting decoded
>tokens are the feature that I most was waiting, because here in Brazil
>most of the headers have text with accents, and come encoded in QP or
>B64. It should improve much the accuracy.
>
>Your's best! (or, sorry I don't know much English and I need to learn
>some ways to finish my emails without they appears cold or
>unpolite :D )

Thanks for the kind words and the problem samples.  I'll do some testing to 
see what's happening and what needs to be changed.

Cheers!

David






More information about the Bogofilter mailing list