bogofilter resistant email

Sat Feb 14 00:54:10 CET 2004

On Fri, 13 Feb 2004 15:12:12 -0800
Chris Wilkes wrote:

> On Fri, Feb 13, 2004 at 05:46:52PM -0500, David Relson wrote:

...[snip]...

> I experienced the same problems and that's why I wrote the "BF
> Normalizer" (scaler is a better term) script that I posted a while
> back.
> 
> This script scaled all the word counts to a maximum number so now when
> an email's classification was corrected it would more likely change
> its spamicity as now slight changes in word counts have big
> differences.
> 
> Use it like this:
>   bogoutil -d wordlist.db | bfnormalize.pl 50 | bogoutil -l new.db
> And the file new.db is the "scaled" version.

Chris,

A very timely opportunity to present your script.  If you'd care to add
some documentation at the start, we can put it in bogofilter/contrib/.

FWIW, bfnormalize.pl outputs to STDERR.  You might want to change that
:-)

> David pointed out that I as also changing the number of emails stored
> in the .MSG_COUNT token, I've since updated the script to ignore that
> one.  I just put that in and I haven't tested out to see what it does
> to people's scores, so you might want to try with it in there and
> without.
> 
> I run it on any of my users here that complain bogofilter isn't taking
> their corrections.  For example, a user put in 350 emails to be
> corrected as spam and only 120 of them were correctly labeled as such
> after running though a -Ns and -s.  I did the scaling script and then
> all 350 emails scored as spam.
> 
> Am I breaking some bayesian rules by doing this?  Maybe.  But it works
> :)  Give it a shot by making a new bogofilter directory and seeing if
> spam are correctly scored:

No doubt you've broken the rules :-)  One of the strengths of bayesian
style filters is that they're very tolerant.  Even Gary Robinson gives
value to "if it works ..."