Test with bogominitrain.pl
Boris 'pi' Piwinger
3.14 at logic.univie.ac.at
Thu Jul 31 14:29:32 CEST 2003
Hi!
I did some testing with my bogominitrain.pl (the version
which will be in 0.14.1). Here are the results.
Summary
(false positives in 10,000 / false negatives in 5,000):
runs \ -o | .501,.501 | .601,.401 | .701,.301
----------+-----------+-----------+-----------
1 | 111 / 71 | 32 / 85 | 31 / 76
2 | 60 / 66 | 29 / 68 | 16 / 62
-f | 38 / 62 | 27 / 57 | 14 / 60
Using a security margin is clearly beneficial.
Repeated training always improved the results, in some cases
dramatically. The smaller the margin the less important
repeating becomes.
The details:
> $ rm -f .bogofilter/*;grep -c '^From ' ham* spam*
> ham:2772
> ham-1:10000
> ham-2:10000
> spam:2815
> spam-1:5000
> spam-2:5000
> spam-3:5000
> $ bogominitrain.pl .bogofilter 'ham ham-1' 'spam spam-1 spam-2'
> [...]
> spam good
> .MSG_COUNT 160 136
>
> False negatives: 62
> False positives: 74
> $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H
> 111
> $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S
> 71
> $ bogominitrain.pl .bogofilter 'ham ham-1' 'spam spam-1 spam-2'
> [...]
> spam good
> .MSG_COUNT 224 186
>
> False negatives: 18
> False positives: 13
> $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H
> 60
> $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S
> 66
> $ bogominitrain.pl -f .bogofilter 'ham ham-1' 'spam spam-1 spam-2'
> [...]
> spam good
> .MSG_COUNT 293 234
>
> False negatives: 0
> False positives: 0
>
>
> 8 runs needed to close off.
> $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H
> 38
> $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S
> 62
> $ rm -f .bogofilter/*
> $ bogominitrain.pl .bogofilter 'ham ham-1' 'spam spam-1 spam-2' '-o 0.601,0.401'
> [...]
> spam good
> .MSG_COUNT 522 344
>
> False negatives: 241
> False positives: 49
> $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H
> 32
> $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S
> 85
> $ bogominitrain.pl .bogofilter 'ham ham-1' 'spam spam-1 spam-2' '-o 0.601,0.401'
> [...]
> spam good
> .MSG_COUNT 656 395
>
> False negatives: 28
> False positives: 7
> $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H
> 29
> $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S
> 68
> $ bogominitrain.pl -f .bogofilter 'ham ham-1' 'spam spam-1 spam-2' '-o 0.601,0.401'
> [...]
> spam good
> .MSG_COUNT 681 404
>
> False negatives: 0
> False positives: 0
>
>
> 2 runs needed to close off.
> $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H
> 27
> $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S
> 57
> $ rm -f .bogofilter/*
> $ bogominitrain.pl .bogofilter 'ham ham-1' 'spam spam-1 spam-2' '-o 0.701,0.301'
> [...]
> spam good
> .MSG_COUNT 619 422
>
> False negatives: 301
> False positives: 58
> $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H
> 31
> $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S
> 76
> $ bogominitrain.pl .bogofilter 'ham ham-1' 'spam spam-1 spam-2' '-o 0.701,0.301'
> [...]
> spam good
> .MSG_COUNT 775 467
>
> False negatives: 17
> False positives: 9
> $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H
> 16
> $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S
> 62
> $ bogominitrain.pl .bogofilter -fs 'ham ham-1' 'spam spam-1 spam-2' '-o 0.701,0.301'
> [...]
> .MSG_COUNT 794 474
>
> False negatives: 0
> False positives: 0
>
>
> 2 runs needed to close off.
> $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H
> 14
> $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S
> 60
pi
More information about the Bogofilter
mailing list