Train on error vs. tuning

Tue Sep 9 15:39:39 CEST 2003

Hi!

I recently thought about some conflicting approaches.

Train on error (as applied by bogominitrain.pl) makes it
impossible to use bogotune, which was so happy with the
output already it would not try to optimizie further. Well,
I did optimize before switching to that approach.

Now, where is the problem: In train on error as opposed to
full training the selection of messages to train with is
highly dependend on the configuration. So this training is
close to optimal for these settings.

So here is my question? How important are these settings
after all when train on error is used? Let me give examples:

1) I have a very high min_dev, so if something slips
through, this is usually due to to few words used in
calculating bogosity. So this is a factor.

2) I have a cutoff at .501 (result of some tuning). Now I
could just move it to .4 or .6 say and train with that. It
would probably not change much. (I add a security margin for
training as described in previous messages). Yes, it is
related to values for unseen words, but?

pi