[bogofilter-announce] bogofilter-0.13.5 - new current release

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Wed Jun 4 11:02:59 CEST 2003


Greg Louis wrote:

> The value of s should never be less than 0.01, because when it is,
> words that appear in one list but not in the other are heavily
> overweighted in the calculation.  "Heavily overweighted" like 3 or 4
> such tokens out of 200, with s around 1e-6, can swing the evaluation
> from spam to nonspam or vice versa.  At 0.001 or 0.0001 the effect
> isn't quite that bad but you still risk random errors.

I did some new test. Training with 20,000 ham and 10,000
spam and testing with unknown messages:

algorithm=fisher
min_dev=0.025
ham_cutoff = 0.00
spam_cutoff = 0.60
robs=0.0001

Spam:
   2769 test.spam
False negatives:
155
Ham:
   2184 test.ham
False positives:
4
------------------------------------------
robs=0.001

Spam:
   2769 test.spam
False negatives:
111
Ham:
   2184 test.ham
False positives:
4
------------------------------------------
robs=0.01

Spam:
   2769 test.spam
False negatives:
90
Ham:
   2184 test.ham
False positives:
5
------------------------------------------

This again shows for my messages, that the new robs is worse.

For robs=0.001 all false positives have values >.85. For .01
>.9.

pi





More information about the Bogofilter mailing list