base64 spam / forcing bogofilter -p judgement

Allyn Fratkin allyn at fratkin.com
Thu Nov 7 16:58:59 CET 2002


Parker Morse wrote:

> Wouldn't it be enough to have bogofilter understand word boundaries in
> base64?

there aren't word boundaries in the normal sense in base64.  base64
data often looks like a single 60-character line of text.  in other words,
a single word.

the problem is these words repeat so infrequently, they can cause major bloat
to your word lists for very little value.  for example, an early version of
bogofilter didn't recognize/discard base64 in my corpus mailboxes.  so instead
of being under 2MB, my goodlist.db file was over 50MB.  that makes it
completely unwieldy for my use, which includes copying it over the internet
and storing at it my "outsourced" web/email service provider.

-- 
Allyn Fratkin             allyn at fratkin.com
Escondido, CA             http://www.fratkin.com/





More information about the Bogofilter mailing list