new writeup re varying Robinson's s and the minimum deviation

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Mon Mar 31 16:19:21 CEST 2003


Greg Louis wrote:

> It appears to be a good idea to throw away tokens with f(w) greater
> than 0.15 and less than 0.85 (ie to set mindev somewhere around 0.35),
> and to use an s value of 0.1 or thereabouts;

I played with these values and my own settings I used so
far. As usual, my test mails are my training database. This
is what happened:

For all tests:
algorithm   = fisher
block_on_subnets = no
tag_header_lines = no
replace_nonascii_characters = no
ham_cutoff  = 0.000000 (0.00e+00)
spam_cutoff = 0.600000 (6.00e-01)
robx        = 0.415000 (4.15e-01)
------------------------------------------
robs        = 0.001000 (1.00e-03)
min_dev     = 0.350000 (3.50e-01)

Spam:
   7105 test.spam
False negatives:
362
Ham:
  19239 test.ham
False positives:
0
------------------------------------------
robs        = 0.001000 (1.00e-03)
min_dev     = 0.050000 (5.00e-02)

Spam:
   7105 test.spam
False negatives:
313
Ham:
  19239 test.ham
False positives:
1
------------------------------------------
robs        = 0.100000 (1.00e-01)
min_dev     = 0.350000 (3.50e-01)

Spam:
   7105 test.spam
False negatives:
366
Ham:
  19239 test.ham
False positives:
3
------------------------------------------
robs        = 0.001000 (1.00e-03)
min_dev     = 0.350000 (3.50e-01)

Spam:
   7105 test.spam
False negatives:
362
Ham:
  19239 test.ham
False positives:
0
------------------------------------------
robs        = 0.001000 (1.00e-03)
min_dev     = 0.025000 (2.50e-02)

Spam:
   7105 test.spam
False negatives:
291
Ham:
  19239 test.ham
False positives:
2

pi





More information about the Bogofilter mailing list