base64 spam / forcing bogofilter -p judgement
Parker Morse
morse at sinauer.com
Thu Nov 7 15:39:47 CET 2002
On Thursday, November 7, 2002, at 07:39 AM, Matthias Andree wrote:
> Allyn Fratkin <allyn at fratkin.com> writes:
>> i "assume" that someday bogofilter will understand mime/base64/q-p
>> but i suspect it will be a while as this is fairly complicated.
>
> Looks like we might just usurp Debian's mimedecode until we have our own
> character set canonicalization.
Wouldn't it be enough to have bogofilter understand word boundaries in
base64?
I'm fairly new to this, but if I understand the theory correctly, the
base64 encoding of spam words should be just as strong evidence of spam as
the decoded word - perhaps more. In fact, some words might turn out to be
"ham" words in plain text, but "spam" words when base64-encoded. Does that
make sense?
That would also allow one to put a large corpus of raw spam which is mixed
(encoded and plaint-text) into bogofilter without needing to decode all
the base64 first.
Or does it turn out that just understanding word boundaries is as hard as
decoding the whole thing?
pjm
More information about the Bogofilter
mailing list