spam goes through

Tom Anderson tanderso at oac-design.com
Wed Sep 5 19:42:53 CEST 2007


Dmitry wrote:
>  On Monday 03 September 2007 03:12, David Relson wrote:
>> "pe*nis" doesn't show because bogofilter doesn't allow '*' in tokens,
>> so "pe*nis" is not in the wordlist.  Try running "echo "pe*nis" |
>> bogolexer -p -H" to see the tokens
>> generated.
> 
> David, I see... And that is the key problem, then. Now I understand why the
> spam with asterisks is so problematic. Consider the following example:
> 
> $ echo "no*goodword" | bogolexer -p -H
> goodword
> 
> This could be a way for spammers to trick out  bogofilter forcing to count
> good words as bad words. Am I right?

Not really.  "Goodword" simply becomes more neutral and less of a "good 
word".  Your hams shouldn't suffer because they still have plenty of 
tokens which are substantially hammy.  And the spams lose their newfound 
edge after just a few training rounds.  As David advised, changing your 
configuration settings is the most important defense against an 
onslaught of false negatives.  Lower your cutoffs until you start 
getting false positives, then raise them slightly.  These are the values 
I've been using for years now:

robx=0.41
robs=0.2
min_dev=0.2
spam_cutoff=0.5
ham_cutoff=0.1

I get zero false positives and just a few unsures (mostly spam) and 
false negatives.  Actually, my spam cutoff used to be a bit lower and I 
could very likely lower the spam cutoff much further, but I've been 
experimenting with bogofilter-milter lately and would rather get a few 
more unsures than bounce any false positives.  But even here I never see 
any "pe*nis" spams.  The only false negatives I've been getting lately 
have been some stock advertisements, but that's because I receive hams 
about investments too.  But even those spams are dying down after the 
initial blast.  Training on error is key.

Tom




More information about the Bogofilter mailing list