Testing fisher
Boris 'pi' Piwinger
3.14 at logic.univie.ac.at
Wed Jan 29 15:08:38 CET 2003
David Relson wrote:
> Taking a snapshot at the start of a test sequence is a good thing to do.
I am just testing. Usually it is a matter of hourse (for one
setting!). Now I get:
real 5m30.244s
user 3m13.320s
sys 1m26.420s
for this script:
#!/bin/sh
echo
echo
rm -f test.ham test.spam
cat test.cf
cat spam* | formail -es bogofilter -c test.cf -v | grep
'^X-Bogosity:' > test.spam
echo "Spam:"
wc -l test.spam
echo "False negatives:"
grep -c No test.spam
cat ham* | formail -es bogofilter -c test.cf -v | grep
'^X-Bogosity:' > test.ham
echo "Ham:"
wc -l test.ham
echo "False positives:"
grep -c Yes test.ham
With this config:
bogofilter_dir=/home/3.14/local/bogolists/data
#robx=0.48
algorithm=fisher
min_dev=0.25
ham_cutoff = 0.00
spam_cutoff = 0.60
header_format = %h: %c, spamicity=%p, version=%v/%a
These are the relevant files:
-rw------- 1 3.14 3.14 8.3M Jan 29 14:24
/home/3.14/local/bogolists/data/goodlist.db
-rw------- 1 3.14 3.14 3.2M Jan 29 14:24
/home/3.14/local/bogolists/data/spamlist.db
-rw------- 1 3.14 3.14 2.1M Jan 29 14:29 ham
-rw------- 1 3.14 3.14 16M Jan 23 09:38 ham1
-rw------- 1 3.14 3.14 16M Jan 23 09:38 ham2
-rw------- 1 3.14 3.14 17M Jan 23 09:39 ham3
-rw------- 1 3.14 3.14 1.9M Jan 29 14:29 spam
-rw------- 1 3.14 3.14 6.1M Dec 18 10:12 spam1
-rw------- 1 3.14 3.14 8.7M Nov 18 12:26 spam2
-rw------- 1 3.14 3.14 6.2M Dec 18 10:12 spam3
-rw------- 1 3.14 3.14 6.1M Jan 16 15:48 spam4
The test results are plausible. I don't get it. Why is this
so much faster than before. BTW:
bogofilter version 0.10.0
pi
More information about the Bogofilter
mailing list