Testing training methods
Boris 'pi' Piwinger
3.14 at logic.univie.ac.at
Tue Nov 18 13:11:44 CET 2003
Hi!
In the past have have done several tests about training methods:
http://article.gmane.org/gmane.mail.bogofilter.general/4373
http://article.gmane.org/gmane.mail.bogofilter.general/5346
http://article.gmane.org/gmane.mail.bogofilter.general/5403
Here is another set of tests:
The first is with my new version of lexer, allowing tokens
of lenght one and two, numbers and slightly changed
characters at token front and back. All tests use the
default parameters (-C).
sizes of mboxes:
t r0 r1 r2 tot
sp 12085 4031 4027 4026 12084
ns 13396 4469 4464 4464 13397
ns: 13397, sp: 12084, target: 34
test: N (full training)
wordlist ns 13396, sp 12085
wo (fn): 0.950000 130 137 116 383
wo (fp): 0.950000 2 2 1 5
wi (fn): 0.498987 44 45 36 125
wi (fp): 0.498987 12 10 12 34
test: R (randomtrain)
wordlist ns 47, sp 401
wo (fn): 0.950000 69 67 54 190
wo (fp): 0.950000 8 8 4 20
wi (fn): 0.908152 45 43 33 121
wi (fp): 0.908152 14 11 9 34
test: M (one run of bogominitrain.pl)
wordlist ns 43, sp 252
wo (fn): 0.950000 85 92 67 244
wo (fp): 0.950000 10 28 18 56
wi (fn): 0.987733 194 182 162 538
wi (fp): 0.987733 4 19 11 34
test: Mf (bogominitrain.pl -fn)
wordlist ns 62, sp 495
wo (fn): 0.950000 61 60 58 179
wo (fp): 0.950000 2 3 2 7
wi (fn): 0.856059 24 29 27 80
wi (fp): 0.856059 10 14 10 34
sizes of the database:
27M test.N.d/wordlist.db
1.7M test.R.d/wordlist.db
1.1M test.M.d/wordlist.db
1.7M test.Mf.d/wordlist.db
Note that there was no security margin used for the three
train on error methods, so those results are not as good as
you would see in normal production.
We clearly see:
- Neither one run of randomtrain nor bogominitrain.pl
produces good results. Both have a high risk of false
positives and leave many false negatives. I cannot explain
why they both produce so different results here, they
should be similar.
- Training to exhaustion (test Mf) again was the best method
in the test, even without security margin.
The second run is as above, but with the lexer we now have
in CVS (including the removal of ' and ` at the end of a token).
test: N
wordlist ns 13396, sp 12085
wo (fn): 0.950000 140 143 122 405
wo (fp): 0.950000 1 3 2 6
wi (fn): 0.498735 43 46 35 124
wi (fp): 0.498735 14 7 13 34
test: R
wordlist ns 49, sp 413
wo (fn): 0.950000 89 96 88 273
wo (fp): 0.950000 11 11 6 28
wi (fn): 0.936851 76 76 72 224
wi (fp): 0.936851 12 12 10 34
test: M
wordlist ns 44, sp 340
wo (fn): 0.950000 170 150 161 481
wo (fp): 0.950000 8 8 7 23
wi (fn): 0.926234 137 119 124 380
wi (fp): 0.926234 10 12 12 34
test: Mf
wordlist ns 56, sp 611
wo (fn): 0.950000 86 90 63 239
wo (fp): 0.950000 4 3 4 11
wi (fn): 0.844118 41 35 33 109
wi (fp): 0.844118 12 11 11 34
25M test.N.d/wordlist.db
1.5M test.R.d/wordlist.db
1.4M test.M.d/wordlist.db
2.0M test.Mf.d/wordlist.db
The results are similar to the first run. So it is
interesting to compare those. For tests N and MF gives
better results with the new lexer. For R the results look
totally different. M also doen't answer the question.
pi
More information about the Bogofilter
mailing list