base64 spam / forcing bogofilter -p judgement

Thu Nov 7 15:39:47 CET 2002

On Thursday, November 7, 2002, at 07:39  AM, Matthias Andree wrote:
> Allyn Fratkin <allyn at fratkin.com> writes:
>> i "assume" that someday bogofilter will understand mime/base64/q-p
>> but i suspect it will be a while as this is fairly complicated.
>
> Looks like we might just usurp Debian's mimedecode until we have our own
> character set canonicalization.

Wouldn't it be enough to have bogofilter understand word boundaries in 
base64?

I'm fairly new to this, but if I understand the theory correctly, the 
base64 encoding of spam words should be just as strong evidence of spam as 
the decoded word - perhaps more. In fact, some words might turn out to be 
"ham" words in plain text, but "spam" words when base64-encoded. Does that 
make sense?

That would also allow one to put a large corpus of raw spam which is mixed 
(encoded and plaint-text) into bogofilter without needing to decode all 
the base64 first.

Or does it turn out that just understanding word boundaries is as hard as 
decoding the whole thing?

pjm