Testing fisher

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Wed Jan 29 15:08:38 CET 2003


David Relson wrote:

> Taking a snapshot at the start of a test sequence is a good thing to do.

I am just testing. Usually it is a matter of hourse (for one
setting!). Now I get:

real    5m30.244s
user    3m13.320s
sys     1m26.420s

for this script:

#!/bin/sh
echo
echo
rm -f test.ham test.spam
cat test.cf
cat spam* | formail -es bogofilter -c test.cf -v | grep
'^X-Bogosity:' > test.spam
echo "Spam:"
wc -l test.spam
echo "False negatives:"
grep -c No test.spam
cat ham* | formail -es bogofilter -c test.cf -v | grep
'^X-Bogosity:' > test.ham
echo "Ham:"
wc -l test.ham
echo "False positives:"
grep -c Yes test.ham

With this config:

bogofilter_dir=/home/3.14/local/bogolists/data
#robx=0.48
algorithm=fisher
min_dev=0.25
ham_cutoff = 0.00
spam_cutoff = 0.60
header_format = %h: %c, spamicity=%p, version=%v/%a

These are the relevant files:
-rw-------    1 3.14     3.14         8.3M Jan 29 14:24
/home/3.14/local/bogolists/data/goodlist.db
-rw-------    1 3.14     3.14         3.2M Jan 29 14:24
/home/3.14/local/bogolists/data/spamlist.db
-rw-------    1 3.14     3.14         2.1M Jan 29 14:29 ham
-rw-------    1 3.14     3.14          16M Jan 23 09:38 ham1
-rw-------    1 3.14     3.14          16M Jan 23 09:38 ham2
-rw-------    1 3.14     3.14          17M Jan 23 09:39 ham3
-rw-------    1 3.14     3.14         1.9M Jan 29 14:29 spam
-rw-------    1 3.14     3.14         6.1M Dec 18 10:12 spam1
-rw-------    1 3.14     3.14         8.7M Nov 18 12:26 spam2
-rw-------    1 3.14     3.14         6.2M Dec 18 10:12 spam3
-rw-------    1 3.14     3.14         6.1M Jan 16 15:48 spam4

The test results are plausible. I don't get it. Why is this
so much faster than before. BTW:
bogofilter version 0.10.0

pi





More information about the Bogofilter mailing list