base64 spam / forcing bogofilter -p judgement

Parker Morse morse at sinauer.com
Thu Nov 7 15:39:47 CET 2002


On Thursday, November 7, 2002, at 07:39  AM, Matthias Andree wrote:
> Allyn Fratkin <allyn at fratkin.com> writes:
>> i "assume" that someday bogofilter will understand mime/base64/q-p
>> but i suspect it will be a while as this is fairly complicated.
>
> Looks like we might just usurp Debian's mimedecode until we have our own
> character set canonicalization.

Wouldn't it be enough to have bogofilter understand word boundaries in 
base64?

I'm fairly new to this, but if I understand the theory correctly, the 
base64 encoding of spam words should be just as strong evidence of spam as 
the decoded word - perhaps more. In fact, some words might turn out to be 
"ham" words in plain text, but "spam" words when base64-encoded. Does that 
make sense?

That would also allow one to put a large corpus of raw spam which is mixed 
(encoded and plaint-text) into bogofilter without needing to decode all 
the base64 first.

Or does it turn out that just understanding word boundaries is as hard as 
decoding the whole thing?

pjm





More information about the Bogofilter mailing list