Bogofilter 0.93.5 -- problems with learning

Jason A. Smith jason-bf at jazbo.dyndns.org
Tue Aug 1 14:08:10 CEST 2006


On Mon, 2006-07-31 at 23:45, Matej Cepl wrote:
> Hi,
> 
> after couple of years of successful using of bogofilter on my account, my
> wife is in the process of switching from her university account and to
> using bogofilter on the remote mail server as well (bfproxy to rescue --
> actually while switching on her spam filtering I found bunch of bugs in
> it). However, although I have trained her database with some 2500 spams
> (most of them product of SA at her university -- details at
> http://web.mit.edu/ist/services/email/nospam/index.html) and similar number
> of legitimate mails where we tried to cover all relevant groups of email in
> her mailboxen.
> 
> However, after couple of days, the ratio of unsure/spam is around 50:50 (or
> more likely 20:20 -- she doesn't have that much emails). Even more than
> amount of spam (I hope it will just settle down), I am concerned with
> number of spams which have bogosity around 0.50. Consider spam (102.msg)
> which is available at http://matej.ceplovi.cz/tmp/bogo/) I get Bogosity
> 0.50 (even with R and bogo.R).
> 
> When running through bogofilter -vv, I get this strange graph (one line got
> broken in news program):
> 
> X-Bogosity: Unsure, spamicity=0.500000, version=0.93.5
>    int  cnt   prob  spamicity histogram
>   0.00  224 0.014535 0.009951
> ################################################
>   0.10   10 0.110217 0.013336 ###
>   0.20    0 0.000000 0.013336 
>   0.30    0 0.000000 0.013336 
>   0.40    0 0.000000 0.013336 
>   0.50    0 0.000000 0.013336 
>   0.60    0 0.000000 0.013336 
>   0.70    0 0.000000 0.013336 
>   0.80    6 0.888191 0.049344 ##
>   0.90   99 0.983172 0.439182 ######################
> 
> Is this normal? 

I believe this is normal with the default bogofilter configuration,
check the output of "bogofilter --query" and look for the min_dev
parameter.  By default it ignores all but the most spammy & hammy tokens
(min_dev = 0.375000), which is why there is a large gap in the middle of
the graph.  Is this the strangeness that you are referring to?

> The version of bogofilter is
> ceplovi at ns:~$ bogofilter -V
> bogofilter version 0.93.5
>     Database: Sleepycat Software: Berkeley DB 3.2.9: (April  7, 2002)
> Copyright (C) 2002-2004 Eric S. Raymond,
>     David Relson, Matthias Andree, Greg Louis
> 
> on Debian/woody (no, I cannot upgrade -- but admin plans to procure new
> server and install new version of Debian/stable -- which would currently
> mean 0.94.4; or 1.0.2 if backports would be installed). And yes, I have
> checked integrity of the database and bogoutil --db-verify doesn't say
> anything.
> 
> Am I just impatient, or there is something wrong with my configuration of
> bogofilter? What other information you would need?
> 
> Thanks a lot,
> 
> Matěj



More information about the Bogofilter mailing list