bogofilter resistant email
David Relson
relson at osagesoftware.com
Sat Feb 14 00:54:10 CET 2004
On Fri, 13 Feb 2004 15:12:12 -0800
Chris Wilkes wrote:
> On Fri, Feb 13, 2004 at 05:46:52PM -0500, David Relson wrote:
...[snip]...
> I experienced the same problems and that's why I wrote the "BF
> Normalizer" (scaler is a better term) script that I posted a while
> back.
>
> This script scaled all the word counts to a maximum number so now when
> an email's classification was corrected it would more likely change
> its spamicity as now slight changes in word counts have big
> differences.
>
> Use it like this:
> bogoutil -d wordlist.db | bfnormalize.pl 50 | bogoutil -l new.db
> And the file new.db is the "scaled" version.
Chris,
A very timely opportunity to present your script. If you'd care to add
some documentation at the start, we can put it in bogofilter/contrib/.
FWIW, bfnormalize.pl outputs to STDERR. You might want to change that
:-)
> David pointed out that I as also changing the number of emails stored
> in the .MSG_COUNT token, I've since updated the script to ignore that
> one. I just put that in and I haven't tested out to see what it does
> to people's scores, so you might want to try with it in there and
> without.
>
> I run it on any of my users here that complain bogofilter isn't taking
> their corrections. For example, a user put in 350 emails to be
> corrected as spam and only 120 of them were correctly labeled as such
> after running though a -Ns and -s. I did the scaling script and then
> all 350 emails scored as spam.
>
> Am I breaking some bayesian rules by doing this? Maybe. But it works
> :) Give it a shot by making a new bogofilter directory and seeing if
> spam are correctly scored:
No doubt you've broken the rules :-) One of the strengths of bayesian
style filters is that they're very tolerant. Even Gary Robinson gives
value to "if it works ..."
More information about the Bogofilter
mailing list