Encoded headers parser

Sat Jul 26 22:26:51 CEST 2003

At 03:39 PM 7/26/03, Junior wrote:
>On Fri, Jul 25, 2003 at 03:59:13PM -0400, David Relson wrote:
>| Junior,
>|
>| Currently, bogofilter has the following in lexer_v3.l:
>|
>| BASE64          [0-9a-zA-Z/+=]+
>| QP              [^[:blank:]]+
>| ENCODED_TOKEN   {BOGOLEX_TOKEN}*=\?{ID}\?(b\?{BASE64}|q\?{QP})\?\=
>|
>| It's more specific than what you're running and should do better.  Let me
>| know how it goes.
>
>It seems to be correct, but the <INITIAL>{ENCODED_TOKEN} in the lexer
>*is* running in the message body, I compiled with flex -d. I don't know
>much these things, but the problem maybe in the logic, and not in the
>rules.

I'll check.  The rules and prefixes probably are not exactly right.

>The chunk of email, to you verify:
>
>         ------=_NextPart_000_0005_01C20B05.A109EF00
>         Content-Type: application/msword;
>                         name="=?iso-8859-1?Q?RESOLU=C7=C3O-P1-CCO06.doc?="
>         Content-Transfer-Encoding: base64
>         Content-Disposition: attachment;
> 
>filename="=?iso-8859-1?Q?RESOLU=C7=C3O-P1-CCO06.doc?="
>
> 
>0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/CQAGAAAAAAAAAAAAAAABAAAAYgAAAAAAAAAA
> 
>EAAAZAAAAAEAAAD+////AAAAAGEAAAD/////////////////////////////////////////////
> 
>////////////////////////////////////////////////////////////////////////////
>
>When the mime is smaller, bogofilter continues to parse correctly, but
>when it is bigger, it crashes. I suspect that the problem is in the line
>of the filename, it tries to decode and goes with the mime too. If I
>put a space before the last ?=, it works ok!

Adding the space causes the lexer to not use the pattern, so it doesn't 
qualify as a fix :-(

>The "mailer" used to sent this email (I suspect that this is widely used :)
>
>         X-Mailer: Microsoft Outlook Express 5.00.2919.6700
>         X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700
>         MIME-Version: 1.0
>
>Hope this help. And good luck with this great work, supporting decoded
>tokens are the feature that I most was waiting, because here in Brazil
>most of the headers have text with accents, and come encoded in QP or
>B64. It should improve much the accuracy.
>
>Your's best! (or, sorry I don't know much English and I need to learn
>some ways to finish my emails without they appears cold or
>unpolite :D )

Thanks for the kind words and the problem samples.  I'll do some testing to 
see what's happening and what needs to be changed.

Cheers!

David