Bogofilter 0.93.5 -- problems with learning

Matej Cepl ceplm at seznam.cz
Tue Aug 1 05:45:37 CEST 2006


Hi,

after couple of years of successful using of bogofilter on my account, my
wife is in the process of switching from her university account and to
using bogofilter on the remote mail server as well (bfproxy to rescue --
actually while switching on her spam filtering I found bunch of bugs in
it). However, although I have trained her database with some 2500 spams
(most of them product of SA at her university -- details at
http://web.mit.edu/ist/services/email/nospam/index.html) and similar number
of legitimate mails where we tried to cover all relevant groups of email in
her mailboxen.

However, after couple of days, the ratio of unsure/spam is around 50:50 (or
more likely 20:20 -- she doesn't have that much emails). Even more than
amount of spam (I hope it will just settle down), I am concerned with
number of spams which have bogosity around 0.50. Consider spam (102.msg)
which is available at http://matej.ceplovi.cz/tmp/bogo/) I get Bogosity
0.50 (even with R and bogo.R).

When running through bogofilter -vv, I get this strange graph (one line got
broken in news program):

X-Bogosity: Unsure, spamicity=0.500000, version=0.93.5
   int  cnt   prob  spamicity histogram
  0.00  224 0.014535 0.009951
################################################
  0.10   10 0.110217 0.013336 ###
  0.20    0 0.000000 0.013336 
  0.30    0 0.000000 0.013336 
  0.40    0 0.000000 0.013336 
  0.50    0 0.000000 0.013336 
  0.60    0 0.000000 0.013336 
  0.70    0 0.000000 0.013336 
  0.80    6 0.888191 0.049344 ##
  0.90   99 0.983172 0.439182 ######################

Is this normal? The version of bogofilter is

ceplovi at ns:~$ bogofilter -V
bogofilter version 0.93.5
    Database: Sleepycat Software: Berkeley DB 3.2.9: (April  7, 2002)
Copyright (C) 2002-2004 Eric S. Raymond,
    David Relson, Matthias Andree, Greg Louis

on Debian/woody (no, I cannot upgrade -- but admin plans to procure new
server and install new version of Debian/stable -- which would currently
mean 0.94.4; or 1.0.2 if backports would be installed). And yes, I have
checked integrity of the database and bogoutil --db-verify doesn't say
anything.

Am I just impatient, or there is something wrong with my configuration of
bogofilter? What other information you would need?

Thanks a lot,

Matěj

-- 
GPG Finger: 89EF 4BC6 288A BF43 1BAB  25C3 E09F EF25 D964 84AC
http://www.ceplovi.cz/matej/blog/, Jabber: ceplma at jabber.cz
23 Marion St. #3, (617) 876-1259, ICQ 132822213
 
How many Bavarian Illuminati does it take to screw in a light
bulb?
Three: one to screw it in, and one to confuse the issue.





More information about the Bogofilter mailing list