spam goes through
David Relson
relson at osagesoftware.com
Mon Sep 3 01:12:50 CEST 2007
On Mon, 3 Sep 2007 02:33:01 +0400
Dmitry wrote:
> Hello!
>
> I can't find good remedy against "bigger pe*nis" kind of spam ;)
>
> Bogofilter seems to be very ineffective fighting with such spam even
> after training on hundreds of sample letters with similar phrases.
> Spammicity never gets past the 0.97% for new spam letters.
>
> Diagnostics:
>
> $ bogoutil -p ~/.bogofilter/wordlist.db .MSG_COUNT
> spam good Fisher
> .MSG_COUNT 3286 32 0.500000
>
> $ bogoutil -p ~/.bogofilter/wordlist.db 'pe*nis'
> spam good Fisher
> (nothing here, why?)
>
> $ bogoutil -p ~/.bogofilter/wordlist.db penis
> spam good Fisher
> penis 12 0 0.999289
>
> I am running bogofilter-sqlite version 1.1.5.
>
> bogofilter.cf changed parameters:
> unicode=yes
> block_on_subnets=yes
> ham_cutoff = 0.45
> spam_cutoff= 0.97
>
> --
> vdb
Dmitry,
I see you're experimenting with lower cutoff values. This is good.
As distributed, bogofilter uses _very_ conservative cutoff values in
order to minimize the likelihood of false positives. Feel free
experiment with an even lower spam_cutoff than you already have -- but
be sure to check that you've not caused lots of false positives.
"pe*nis" doesn't show because bogofilter doesn't allow '*' in tokens,
so "pe*nis" is not in the wordlist. Try running "echo "pe*nis" |
bogolexer -p -H" to see the tokens
generated.
Several years ago I applied the bogotune utility to large collections
of _my_ spam and of _my_ ham. For my messages it recommended a
ham_cutoff slightly lower than you're using and a spam_cutoff _much_
lower than you use. Bogofilter is working well for me in catching
all the mangled spellings that spammers are trying. For what it's
worth, my spam and ham cutoff values are less than 0.20 different.
HTH,
David
More information about the Bogofilter
mailing list