repetitive-training experiments
Greg Louis
glouis at dynamicro.on.ca
Sun Mar 14 16:24:31 CET 2004
A new experiment comparing results I get with full training, half full
and half train-on-error, and train-on-error with three different
maximum registration limits is written up at
http://www.bgl.nu/bogofilter/reptrain2.html
It seems that, whatever its benefit to other people, training on error
from scratch (with or without repetition) doesn't work well for me. I
could find no way to get the false-positive count low enough without
allowing far too many false negatives. (One thing I didn't try, that
might have helped, was to use a rather high spam_cutoff and low
ham_cutoff during training, thus defining "unsure" more broadly than is
done in production and so using more marginal-scoring messages for
training. I intend to try that and add the results to this writeup.)
The previous experiment, described at
http://www.bgl.nu/bogofilter/reptrain.html
seemed to indicate that a limited number of rounds of repetitive
training might help in the situation where training on error is used
after an initial period of full training. I may follow that up in a
bit more detail as well.
--
| G r e g L o u i s | gpg public key: 0x400B1AA86D9E3E64 |
| http://www.bgl.nu/~glouis | (on my website or any keyserver) |
| http://wecanstopspam.org in signatures helps fight junk email. |
More information about the Bogofilter
mailing list