more test results

David Relson relson at osagesoftware.com
Thu Feb 13 19:38:04 CET 2003


Greetings,

More results of efficacy testing to measure the value of 3 bogofilter 
config options:

asc - replace-non-ascii
net - block-on-subnets
tag - tag-header-lines

Each test used wordlists built (from the Oct-Dec-2002 training corpus) 
using the same options being tested.  All 8 combinations of the 3 options 
were tested (with "def" denoting the default config with all 3 options 
disabled).  The contrib/randomtrain script was used to train-on-errors and 
the same shuffled message order was used for the 8 tests.

The numbers are below.  The "reg" columns give the number of messages that 
bogofilter classified incorrectly.  The randomtrain script then trains 
bogofilter on each error so that it can do better for the next 
message.  For comparison purposes, I have included the earlier test results 
at the end of each data line.

With all the numbers together, it's interesting to note that the good-reg 
number is virtually the same as the ham-unsure (h-u) number.  In contrast 
the spam-reg value is 30-40 messages smaller than the spam-unsure 
number.  The implication is that train-on-error for the spam messages helps 
a lot in identifying later spam, while the ham results aren't 
affected.  This may be caused by the many duplicated spams received by my 
mail server.

David

02/13 10:37
              spam  reg   good  reg     s-s  s-h  s-u  h-s   h-h  h-u
def          1745   82   5044  123    1609   3   133   2   4918  124
asc          1745   83   5044  123    1608   3   134   2   4918  124
net          1745   81   5044  113    1604   5   136   2   4934  108
tag          1745   81   5044  125    1604   3   138   2   4918  124
asc-net      1745   83   5044  113    1602   4   139   2   4934  108
asc-tag      1745   82   5044  125    1603   3   139   2   4918  124
net-tag      1745   85   5044  114    1599   4   142   2   4928  114
net-tag-asc  1745   87   5044  114    1597   3   145   2   4928  114


classification parameters - all bogofilter's default values:

robs - 0.001
robx - 0.415
min_dev - 0.10
spam_cutoff - 0.95
ham_cutoff - 0.10
--------------------------------------------------------
David Relson                   Osage Software Systems, Inc.
relson at osagesoftware.com       Ann Arbor, MI 48103
www.osagesoftware.com          tel:  734.821.8800





More information about the Bogofilter mailing list