training to exhaustion and the risk of overvaluing irrelevant tokens

Thu Aug 14 14:48:59 CEST 2003

David Relson wrote:

> The difficulty is that each message _can_ err more than once (in the course 
> of multiple passes), hence be trained on more than once.  So the token and 
> message counts no longer reflect a unique set of messages, as the Bayesian 
> theorem assumes.

BTW: Doesn't the theorem assume, that the message you train
with are chosen randomly? A complete training would satisfy
this, but any algorithm which makes decision (which are not
purely random) which messages are chosen to train with would
already violate the assumption.

But I don't really care about that. It is important what
works, not why it works. The idea of train on error is to
just add relevant information which is carefully chosen. So
we are trying to build a database which has word with few
highly significant words with estimates for their
proabilites based on those choices. You then assume, this
database is a correct picture of the real world and just
forget where it comes from. Then apply the theorem.

Clearly, not pure theory, but reasonable and working.

pi