spam goes through
Tom Anderson
tanderso at oac-design.com
Wed Sep 5 19:42:53 CEST 2007
Dmitry wrote:
> On Monday 03 September 2007 03:12, David Relson wrote:
>> "pe*nis" doesn't show because bogofilter doesn't allow '*' in tokens,
>> so "pe*nis" is not in the wordlist. Try running "echo "pe*nis" |
>> bogolexer -p -H" to see the tokens
>> generated.
>
> David, I see... And that is the key problem, then. Now I understand why the
> spam with asterisks is so problematic. Consider the following example:
>
> $ echo "no*goodword" | bogolexer -p -H
> goodword
>
> This could be a way for spammers to trick out bogofilter forcing to count
> good words as bad words. Am I right?
Not really. "Goodword" simply becomes more neutral and less of a "good
word". Your hams shouldn't suffer because they still have plenty of
tokens which are substantially hammy. And the spams lose their newfound
edge after just a few training rounds. As David advised, changing your
configuration settings is the most important defense against an
onslaught of false negatives. Lower your cutoffs until you start
getting false positives, then raise them slightly. These are the values
I've been using for years now:
robx=0.41
robs=0.2
min_dev=0.2
spam_cutoff=0.5
ham_cutoff=0.1
I get zero false positives and just a few unsures (mostly spam) and
false negatives. Actually, my spam cutoff used to be a bit lower and I
could very likely lower the spam cutoff much further, but I've been
experimenting with bogofilter-milter lately and would rather get a few
more unsures than bounce any false positives. But even here I never see
any "pe*nis" spams. The only false negatives I've been getting lately
have been some stock advertisements, but that's because I receive hams
about investments too. But even those spams are dying down after the
initial blast. Training on error is key.
Tom
More information about the Bogofilter
mailing list