[bogofilter-announce] bogofilter-0.13.5 - new current release
Boris 'pi' Piwinger
3.14 at logic.univie.ac.at
Wed Jun 4 11:02:59 CEST 2003
Greg Louis wrote:
> The value of s should never be less than 0.01, because when it is,
> words that appear in one list but not in the other are heavily
> overweighted in the calculation. "Heavily overweighted" like 3 or 4
> such tokens out of 200, with s around 1e-6, can swing the evaluation
> from spam to nonspam or vice versa. At 0.001 or 0.0001 the effect
> isn't quite that bad but you still risk random errors.
I did some new test. Training with 20,000 ham and 10,000
spam and testing with unknown messages:
algorithm=fisher
min_dev=0.025
ham_cutoff = 0.00
spam_cutoff = 0.60
robs=0.0001
Spam:
2769 test.spam
False negatives:
155
Ham:
2184 test.ham
False positives:
4
------------------------------------------
robs=0.001
Spam:
2769 test.spam
False negatives:
111
Ham:
2184 test.ham
False positives:
4
------------------------------------------
robs=0.01
Spam:
2769 test.spam
False negatives:
90
Ham:
2184 test.ham
False positives:
5
------------------------------------------
This again shows for my messages, that the new robs is worse.
For robs=0.001 all false positives have values >.85. For .01
>.9.
pi
More information about the Bogofilter
mailing list