spam goes through

David Relson relson at osagesoftware.com
Mon Sep 3 01:12:50 CEST 2007


On Mon, 3 Sep 2007 02:33:01 +0400
Dmitry wrote:

> Hello!
> 
> I can't find good remedy against "bigger pe*nis" kind of spam ;) 
> 
> Bogofilter seems to be very ineffective fighting with such spam even
> after training on hundreds of sample letters with similar phrases.
> Spammicity never gets past the 0.97% for new spam letters.
> 
> Diagnostics:
> 
> $ bogoutil -p ~/.bogofilter/wordlist.db .MSG_COUNT
>                                  spam    good    Fisher
> .MSG_COUNT                       3286      32  0.500000
> 
> $ bogoutil -p ~/.bogofilter/wordlist.db 'pe*nis'   
>                                  spam    good    Fisher 
> (nothing here, why?)
> 
> $ bogoutil -p ~/.bogofilter/wordlist.db penis
>                                  spam    good    Fisher
> penis                              12       0  0.999289
> 
> I am running bogofilter-sqlite version 1.1.5.
> 
> bogofilter.cf changed parameters:
> unicode=yes
> block_on_subnets=yes
> ham_cutoff = 0.45
> spam_cutoff= 0.97
> 
> -- 
> vdb

Dmitry,

I see you're experimenting with lower cutoff values.  This is good.
As distributed, bogofilter uses _very_ conservative cutoff values in
order to minimize the likelihood of false positives.  Feel free
experiment with an even lower spam_cutoff than you already have -- but
be sure to check that you've not caused lots of false positives.

"pe*nis" doesn't show because bogofilter doesn't allow '*' in tokens,
so "pe*nis" is not in the wordlist.  Try running "echo "pe*nis" |
bogolexer -p -H" to see the tokens 
generated. 

Several years ago I applied the bogotune utility to large collections
of _my_ spam and of _my_ ham.  For my messages it recommended a
ham_cutoff slightly lower than you're using and a spam_cutoff _much_
lower than you use.  Bogofilter is working well for me in catching
all the mangled spellings that spammers are trying.  For what it's
worth, my spam and ham cutoff values are less than 0.20 different.

HTH,

David



More information about the Bogofilter mailing list