Getting rid of plain obvious spam

David Relson relson at osagesoftware.com
Wed Apr 7 13:36:24 CEST 2004


On Wed, 07 Apr 2004 12:04:52 +0200
Boris 'pi' Piwinger wrote:

> Andreas Pardeike wrote:

...[snip]...

> It looks like you are in good shape. Your parameters are too
> strict, probably:
> 
> > bogofilter -vvv < viagra01.txt 
> > X-Bogosity: No, tests=bogofilter, spamicity=0.987342, version=0.17.5
> 
> This is a very high value. It is still not rated as spam. Why?
> 
> > bogofilter -Q
> > # bogofilter version 0.17.5
> > 
> > robx        = 0.644661  # (6.45e-01)
> > robs        = 0.017800  # (1.78e-02)
> > min_dev     = 0.375000  # (3.75e-01)
> > ham_cutoff  = 0.000000  # (0.00e+00)
> > spam_cutoff = 0.990000  # (9.90e-01)
> 
> The answer is here. You are extremely strict what you call
> spam. While this helps to make sure you don't get false
> positives you will get a lot of false negatives as in your
> example. You need to find out which cutoff is still safe for
> you, but catches a lot of spam.

Andreas,

It looks like you're using a combination of bogofilter 0.17.5's new
scoring parameters and your own values.  "bogofilter -C -Q" shows the
following parameters (without config file):

robx        = 0.520000  # (5.20e-01)
robs        = 0.017800  # (1.78e-02)
min_dev     = 0.375000  # (3.75e-01)
ham_cutoff  = 0.000000  # (0.00e+00)
spam_cutoff = 0.990000  # (9.90e-01)

You have the same values, except for robx.

Suggestion 1:  Use only the default parameters, keep on training, and
bogofilter will do well (once there's been enough training).

Suggestion 2:  Continue to use your old parameters.  I'm betting that
they were working well for you.  If that is so, there's no need to
change. If it ain't broken, don't fix it!

Suggestion 3:  Lower the spam_cutoff to 0.98 or 0.97.  Lower values
increase the likelihood of a false positive.  You'll have to decide for
yourself which is more important -- having the message be identified as
spam or increasing the chances of a false positive.

Hope this helps!

David




More information about the Bogofilter mailing list