repetitive-training experiments

Sun Mar 14 16:24:31 CET 2004

A new experiment comparing results I get with full training, half full
and half train-on-error, and train-on-error with three different
maximum registration limits is written up at
http://www.bgl.nu/bogofilter/reptrain2.html

It seems that, whatever its benefit to other people, training on error
from scratch (with or without repetition) doesn't work well for me.  I
could find no way to get the false-positive count low enough without
allowing far too many false negatives.  (One thing I didn't try, that
might have helped, was to use a rather high spam_cutoff and low
ham_cutoff during training, thus defining "unsure" more broadly than is
done in production and so using more marginal-scoring messages for
training.  I intend to try that and add the results to this writeup.)

The previous experiment, described at
http://www.bgl.nu/bogofilter/reptrain.html
seemed to indicate that a limited number of rounds of repetitive
training might help in the situation where training on error is used
after an initial period of full training.  I may follow that up in a
bit more detail as well.

-- 
| G r e g  L o u i s         | gpg public key: 0x400B1AA86D9E3E64 |
|  http://www.bgl.nu/~glouis |   (on my website or any keyserver) |
|  http://wecanstopspam.org in signatures helps fight junk email. |