repetitive training

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Tue Mar 9 15:07:02 CET 2004


Greg Louis wrote:

>> > already; for iteration n, it's reported on line n-1 of the table of
>> > error rates.
>> 
>> I don't understand that. I thought that are the numbers
>> before the next training.
> 
> Same thing: the output of the previous run is the error rate before the
> next run.

Maybe it is easier to understand with an example. Let's look
at the second experiment:

 run testfn trainfp trainfn testfnpc trainfppc trainfnpc
   0   1065      67      21     2.69     0.371     0.106
   1   1022      67      21     2.58     0.371     0.106
   2    941      67      21     2.38     0.371     0.106
[...]

The last three columns just replicate the previous three in
percentage, so the previous three matter.

testfn refers to the test set while train* refers to the
training set. Now the latter are the numbers -- as you
explain -- which will be used in the next round, so this
must be after tuning, right? Where is the number before that?

>> Also, as I mentioned earlier it is very suspicious that you
>> have for ten rounds exactly the same number of messages to
>> train with.
> 
> It would be, except that what I did was scan all those messages but
> train with only the errors, as this sentence from the writeup explains:
>
> The following steps were then iterated: the smaller parts were
> classified using the newly determined parameters, and the messages that
> were classified wrongly or as unsure were used to train the database
> further.

Right, but it is very unlikely that this is the same number
every time.

> Again, the detailed log will tell us how many messages were used in
> each round of training, and I'll put that up with the parameter values.
> Probably get to it on the coming weekend.

Great. Something must have gone wrong.

pi




More information about the Bogofilter mailing list