bogofilter resistant email

Chris Wilkes cwilkes-bf at ladro.com
Sat Feb 14 00:12:12 CET 2004


On Fri, Feb 13, 2004 at 05:46:52PM -0500, David Relson wrote:
> 
> > I don't think the spammishness of the low count words can compete. 
> > And if I re-register enough times so that the ham words are more
> > spammy, then I fear getting false positives.
> 
> Yep.  You've got a tough one there.  Based on the info bogofilter has,
> the message isn't spam.  It's missing too many of the earmarks - no
> mortgage rates, no "make it bigger", etc.

I experienced the same problems and that's why I wrote the "BF
Normalizer" (scaler is a better term) script that I posted a while back.

This script scaled all the word counts to a maximum number so now when
an email's classification was corrected it would more likely change its
spamicity as now slight changes in word counts have big differences.

Use it like this:
  bogoutil -d wordlist.db | bfnormalize.pl 50 | bogoutil -l new.db
And the file new.db is the "scaled" version.

David pointed out that I as also changing the number of emails stored in
the .MSG_COUNT token, I've since updated the script to ignore that one.  I
just put that in and I haven't tested out to see what it does to
people's scores, so you might want to try with it in there and without.

I run it on any of my users here that complain bogofilter isn't taking
their corrections.  For example, a user put in 350 emails to be
corrected as spam and only 120 of them were correctly labeled as such
after running though a -Ns and -s.  I did the scaling script and then
all 350 emails scored as spam.

Am I breaking some bayesian rules by doing this?  Maybe.  But it works
:)  Give it a shot by making a new bogofilter directory and seeing if
spam are correctly scored:
  mkdir /tmp/newbf
  bogoutil -d $BOGOFILTER_DIR/wordlist.db | bfnormalize.pl 50 \
    | bogoutil -l /tmp/newbf/wordlist.db
  bogofilter -v -B /tmp/bademails/*
  export BOGOFILTER_DIR=/tmp/newbf
  bogofilter -v -B /tmp/bademails/*


The script is here:
  http://ladro.com/bf/bfnormalize.pl

Chris




More information about the Bogofilter mailing list