Test with bogominitrain.pl

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Thu Jul 31 14:29:32 CEST 2003


Hi!

I did some testing with my bogominitrain.pl (the version
which will be in 0.14.1). Here are the results.

Summary
 (false positives in 10,000 / false negatives in 5,000):

runs \ -o | .501,.501 | .601,.401 | .701,.301
----------+-----------+-----------+-----------
   1      | 111 /  71 |  32 /  85 |  31 /  76
   2      |  60 /  66 |  29 /  68 |  16 /  62
  -f      |  38 /  62 |  27 /  57 |  14 /  60

Using a security margin is clearly beneficial.

Repeated training always improved the results, in some cases
dramatically. The smaller the margin the less important
repeating becomes.


The details:

> $ rm -f .bogofilter/*;grep -c '^From ' ham* spam*
> ham:2772
> ham-1:10000
> ham-2:10000
> spam:2815
> spam-1:5000
> spam-2:5000
> spam-3:5000
> $ bogominitrain.pl .bogofilter 'ham ham-1' 'spam spam-1 spam-2'
> [...]
>                        spam   good
> .MSG_COUNT              160    136
> 
> False negatives: 62
> False positives: 74
> $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H
> 111
> $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S
> 71
> $ bogominitrain.pl .bogofilter 'ham ham-1' 'spam spam-1 spam-2'
> [...]
>                        spam   good
> .MSG_COUNT              224    186
> 
> False negatives: 18
> False positives: 13
> $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H
> 60
> $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S
> 66
> $ bogominitrain.pl -f .bogofilter 'ham ham-1' 'spam spam-1 spam-2'
> [...]
>                        spam   good
> .MSG_COUNT              293    234
> 
> False negatives: 0
> False positives: 0
> 
> 
> 8 runs needed to close off.
> $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H
> 38
> $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S
> 62
> $ rm -f .bogofilter/*
> $ bogominitrain.pl .bogofilter 'ham ham-1' 'spam spam-1 spam-2' '-o 0.601,0.401'
> [...]
>                        spam   good
> .MSG_COUNT              522    344
> 
> False negatives: 241
> False positives: 49
> $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H
> 32
> $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S
> 85
> $ bogominitrain.pl .bogofilter 'ham ham-1' 'spam spam-1 spam-2' '-o 0.601,0.401'
> [...]
>                        spam   good
> .MSG_COUNT              656    395
> 
> False negatives: 28
> False positives: 7
> $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H
> 29
> $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S
> 68
> $ bogominitrain.pl -f .bogofilter 'ham ham-1' 'spam spam-1 spam-2' '-o 0.601,0.401'
> [...]
>                        spam   good
> .MSG_COUNT              681    404
> 
> False negatives: 0
> False positives: 0
> 
> 
> 2 runs needed to close off.
> $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H
> 27
> $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S
> 57
> $ rm -f .bogofilter/*
> $ bogominitrain.pl .bogofilter 'ham ham-1' 'spam spam-1 spam-2' '-o 0.701,0.301'
> [...]
>                        spam   good
> .MSG_COUNT              619    422
> 
> False negatives: 301
> False positives: 58
> $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H
> 31
> $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S
> 76
> $ bogominitrain.pl .bogofilter 'ham ham-1' 'spam spam-1 spam-2' '-o 0.701,0.301'
> [...]
>                        spam   good
> .MSG_COUNT              775    467
> 
> False negatives: 17
> False positives: 9
> $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H
> 16
> $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S
> 62
> $ bogominitrain.pl .bogofilter -fs 'ham ham-1' 'spam spam-1 spam-2' '-o 0.701,0.301'
> [...]
> .MSG_COUNT              794    474
> 
> False negatives: 0
> False positives: 0
> 
> 
> 2 runs needed to close off.
> $ bogofilter -d .bogofilter -vtM <ham-2|grep -cv ^H
> 14
> $ bogofilter -d .bogofilter -vtM <spam-3|grep -cv ^S
> 60

pi





More information about the Bogofilter mailing list