Importance of ordering in train on error

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Sat Mar 27 20:28:47 CET 2004


Hi!

I just did some little experiment. In building the database
with training to exhaustion I have about 80% of the messages
in the first round. So clearly a lot of simple messages are
used there at the beginning. Now the idea was that ignoring
those one can get an even more focused database. So I took
only those message which were chosen in the second and later
rounds and started with this database. To my surprise about
the same number of messages were needed in the end.

This suggests that there is not too much room for an optimal
choosing of messages.

One idea (but that would take forever to finish) would be to
always take the worst failure.

pi




More information about the Bogofilter mailing list