tuning.sh [was: bogofilter-0.13.6.3 - new current release]
Greg Louis
glouis at dynamicro.on.ca
Fri Jun 20 17:04:12 CEST 2003
On 20030620 (Fri) at 1408:14 +0200, Boris 'pi' Piwinger wrote:
> David Relson wrote:
>
> > The rule of thumb is that $target should be 0.1% to 0.3% of the test set
> > size.
>
> For me that would be more than 70 (using .2%). Or do you
> only count half the size since the other half is used to
> build the database? Anyhow this would still be way to big.
>
YET AGAIN ONCE MORE ANOTHER (and last) TIME:
You need more false positives to run a parameter scan than you want to
have in production. David is quite right: if you use tuning.sh you
should set the target somewhere between 0.1% and 0.3% of the total size
of your test files (r?.ns). Your own results show that 12 is too low,
as you're getting some reports of more fp during the scan -- and any
record for which the fp count is not the target is invalid.
AFTER you have your s and mindev (and x and cache size and ham cutoff)
properly set, THEN you ADJUST THE SPAM CUTOFF till you have a balance
between fn and fp that you can live with.
--
| G r e g L o u i s | gpg public key: finger |
| http://www.bgl.nu/~glouis | glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |
More information about the Bogofilter
mailing list