Getting rid of plain obvious spam
andreas at pardeike.net
Wed Apr 7 10:36:41 EDT 2004
On 2004-04-07, at 13.36, David Relson wrote:
> It looks like you're using a combination of bogofilter 0.17.5's new
> scoring parameters and your own values. "bogofilter -C -Q" shows the
> following parameters (without config file):
> robx = 0.520000 # (5.20e-01)
> robs = 0.017800 # (1.78e-02)
> min_dev = 0.375000 # (3.75e-01)
> ham_cutoff = 0.000000 # (0.00e+00)
> spam_cutoff = 0.990000 # (9.90e-01)
> You have the same values, except for robx.
You're right. Four days ago, I found out that our old wordlist.db had
only 1700 ham but 52000 spam messages in it. I am not aware of any
mistakes I made in the past but who knows. So I thought it would be
good to start over and I collected about 1000 messages that were already
sorted in ham/spam and generated a initial wordlist with that corpus.
Then, I think, I played around with bogoutil and I accidentally used it
so it updated the values (hence the different robx).
> Suggestion 1: Use only the default parameters, keep on training, and
> bogofilter will do well (once there's been enough training).
> Suggestion 2: Continue to use your old parameters. I'm betting that
> they were working well for you. If that is so, there's no need to
> change. If it ain't broken, don't fix it!
It's difficult for me to decide between 1 + 2 without more info.
1) a month ago got me to the problem that I originally tried to fix.
> Suggestion 3: Lower the spam_cutoff to 0.98 or 0.97. Lower values
> increase the likelihood of a false positive. You'll have to decide for
> yourself which is more important -- having the message be identified as
> spam or increasing the chances of a false positive.
I see 3) as an additional parameter that I can tweak together with a
choice of 1) or 2).
BTW: quick poll - what are you people using to verify the work of
a) undetected_spam / total_spam ratio (in %)
b) undetected_spam / total_ham ratio (in %)
(You can see my figures at http://www.pardeike.net/junk.html?ratio)
More information about the Bogofilter