libgmime

Matthias Andree matthias.andree at gmx.de
Fri Dec 13 15:14:13 CET 2002


Greg Louis <glouis at dynamicro.on.ca> writes:

> On 20021213 (Fri) at 0230:17 +0100, Matthias Andree wrote:
>
>> >  --B_3121952318_1802999
>> >  Content-type: application/octet-stream"
>> >  Content-disposition: attachment
>> >  Content-transfer-encoding: x-uuencode
>> 
>> Mind the "x-" prefix; this is negotiated between sender and client, x-
>> means local extension, and is reserved for nonstandard extensions.
>> 
>> As this is application/octet-stream, we don't care anyways.
>
> Currently that's not true; the uuencoded data are read and a very large 
> number of meaningless tokens is generated.

I thought we had an "eat uuencode lines" rule in lexer.l now.

> These first distort the classification, and ultimately -- but it can
> take quite a while -- subside beneath the min_dev threshold,

How do we identify tokens that cannot and will no more contribute to the
spamicity? We will want to have a cron job weed these entries out of the
.db once in a week.

> thereafter occupying space in the wordlists to no purpose.  Of course,
> when bogofilter is able statefully to detect and skip over
> application/octet-stream attachments, the encoding will become
> irrelevant.

Bogofilter will never be able to distinguish uuencode from random
text. It may be able to skip over application/octet-stream attachments
altogether though, without looking at the encoding. We do for sure not
need to decode that stuff. It's opaque binary data, or it would have an
other data type. Yes I know that many webmailers will send ANY
attachment as application/octet-stream, but this will present the user
with the burdens of saving the attachment, figuring the right
application and launch that, so it's not exactly "easy access" at the
recipient side.

-- 
Matthias Andree




More information about the bogofilter-dev mailing list