Bogofilter 0.93.5 -- problems with learning
Jason A. Smith
jason-bf at jazbo.dyndns.org
Tue Aug 1 14:08:10 CEST 2006
On Mon, 2006-07-31 at 23:45, Matej Cepl wrote:
> Hi,
>
> after couple of years of successful using of bogofilter on my account, my
> wife is in the process of switching from her university account and to
> using bogofilter on the remote mail server as well (bfproxy to rescue --
> actually while switching on her spam filtering I found bunch of bugs in
> it). However, although I have trained her database with some 2500 spams
> (most of them product of SA at her university -- details at
> http://web.mit.edu/ist/services/email/nospam/index.html) and similar number
> of legitimate mails where we tried to cover all relevant groups of email in
> her mailboxen.
>
> However, after couple of days, the ratio of unsure/spam is around 50:50 (or
> more likely 20:20 -- she doesn't have that much emails). Even more than
> amount of spam (I hope it will just settle down), I am concerned with
> number of spams which have bogosity around 0.50. Consider spam (102.msg)
> which is available at http://matej.ceplovi.cz/tmp/bogo/) I get Bogosity
> 0.50 (even with R and bogo.R).
>
> When running through bogofilter -vv, I get this strange graph (one line got
> broken in news program):
>
> X-Bogosity: Unsure, spamicity=0.500000, version=0.93.5
> int cnt prob spamicity histogram
> 0.00 224 0.014535 0.009951
> ################################################
> 0.10 10 0.110217 0.013336 ###
> 0.20 0 0.000000 0.013336
> 0.30 0 0.000000 0.013336
> 0.40 0 0.000000 0.013336
> 0.50 0 0.000000 0.013336
> 0.60 0 0.000000 0.013336
> 0.70 0 0.000000 0.013336
> 0.80 6 0.888191 0.049344 ##
> 0.90 99 0.983172 0.439182 ######################
>
> Is this normal?
I believe this is normal with the default bogofilter configuration,
check the output of "bogofilter --query" and look for the min_dev
parameter. By default it ignores all but the most spammy & hammy tokens
(min_dev = 0.375000), which is why there is a large gap in the middle of
the graph. Is this the strangeness that you are referring to?
> The version of bogofilter is
> ceplovi at ns:~$ bogofilter -V
> bogofilter version 0.93.5
> Database: Sleepycat Software: Berkeley DB 3.2.9: (April 7, 2002)
> Copyright (C) 2002-2004 Eric S. Raymond,
> David Relson, Matthias Andree, Greg Louis
>
> on Debian/woody (no, I cannot upgrade -- but admin plans to procure new
> server and install new version of Debian/stable -- which would currently
> mean 0.94.4; or 1.0.2 if backports would be installed). And yes, I have
> checked integrity of the database and bogoutil --db-verify doesn't say
> anything.
>
> Am I just impatient, or there is something wrong with my configuration of
> bogofilter? What other information you would need?
>
> Thanks a lot,
>
> Matěj
More information about the Bogofilter
mailing list