new writeup re varying Robinson's s and the minimum deviation
Boris 'pi' Piwinger
3.14 at logic.univie.ac.at
Mon Mar 31 16:19:21 CEST 2003
Greg Louis wrote:
> It appears to be a good idea to throw away tokens with f(w) greater
> than 0.15 and less than 0.85 (ie to set mindev somewhere around 0.35),
> and to use an s value of 0.1 or thereabouts;
I played with these values and my own settings I used so
far. As usual, my test mails are my training database. This
is what happened:
For all tests:
algorithm = fisher
block_on_subnets = no
tag_header_lines = no
replace_nonascii_characters = no
ham_cutoff = 0.000000 (0.00e+00)
spam_cutoff = 0.600000 (6.00e-01)
robx = 0.415000 (4.15e-01)
------------------------------------------
robs = 0.001000 (1.00e-03)
min_dev = 0.350000 (3.50e-01)
Spam:
7105 test.spam
False negatives:
362
Ham:
19239 test.ham
False positives:
0
------------------------------------------
robs = 0.001000 (1.00e-03)
min_dev = 0.050000 (5.00e-02)
Spam:
7105 test.spam
False negatives:
313
Ham:
19239 test.ham
False positives:
1
------------------------------------------
robs = 0.100000 (1.00e-01)
min_dev = 0.350000 (3.50e-01)
Spam:
7105 test.spam
False negatives:
366
Ham:
19239 test.ham
False positives:
3
------------------------------------------
robs = 0.001000 (1.00e-03)
min_dev = 0.350000 (3.50e-01)
Spam:
7105 test.spam
False negatives:
362
Ham:
19239 test.ham
False positives:
0
------------------------------------------
robs = 0.001000 (1.00e-03)
min_dev = 0.025000 (2.50e-02)
Spam:
7105 test.spam
False negatives:
291
Ham:
19239 test.ham
False positives:
2
pi
More information about the Bogofilter
mailing list