attachments and binary data

David Relson relson at osagesoftware.com
Fri Nov 26 18:55:00 CET 2004


On Fri, 26 Nov 2004 18:45:50 +0100
Matthias Andree wrote:

> On Fri, 26 Nov 2004, Evgeny Kotsuba wrote:
> 
> [about slowness in bogofilter 0.17]
> > Matthias Andree wrote:
> > 
> > >So? Try 0.92.8 and let us know if the problem persists.
> > >
> > Seems that now  there is no such problems.
> 
> Good.
> 
> > Seems that still there are some problems with attachments made by 
> > Microsoft Outlook Express
> 
> This is not an attachment, but inlined uuencoded data. I'm not sure
> how we should proceed in this case, because this isn't really
> structured and hard to distinguish from text (we'd have to peek into
> the data to see if it's really a file or just something that looks
> similar, see http://support.microsoft.com/?kbid=265230)
> 
>> begin 666 LK2540-7R.pdf
...[snip].....
>> ================= endof clinical case 1

>> ======  clinical case 2 =========
>> Message-ID: <009701c3e144$0ec14a00$0f02000a at blabla.ru>
...[snip]...
>> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106
>> 
>>
>OloO7nXAjYZe1FVeE4saqjGe9SoVWgrqHi7qwIpNB268PaO7ILk9oamzdYkuNwoUS9AsfV
>FoNBvg>
>Eq2Uq8g53WyKoqgbVtBJhguLg8UlM0kqmtJyrLQfkIKqSahKW9eAGJU0L5KpVq4R+H/+qZ
>+8SFh4

> And this may warrant further investigation.

In case 1, the "begin 666 LK2540-7R.pdf" is an attachment, AFAICT, while
case 2 is inlined 'stuff' (to use the technical term).  A few lines of
lexer_v3.l code is sufficient to discard the attachment.

We could get fancy and have the lexer call a function with the filename
and have the function return 'keep' if there's a text related extension
or 'discard' if it looks like a binary file.  However, as mentioned
elsewhere, in my archive (with 2 yrs of messages) there are only 6
references to "begin 666" and 3 of those are from Oct 2003 -- the other
time Evgeny reported this problem.  It doesn't look like an important
issue to me (though he might disagree).

David



More information about the bogofilter-dev mailing list