repetitive training
Boris 'pi' Piwinger
3.14 at logic.univie.ac.at
Tue Mar 9 15:07:02 CET 2004
Greg Louis wrote:
>> > already; for iteration n, it's reported on line n-1 of the table of
>> > error rates.
>>
>> I don't understand that. I thought that are the numbers
>> before the next training.
>
> Same thing: the output of the previous run is the error rate before the
> next run.
Maybe it is easier to understand with an example. Let's look
at the second experiment:
run testfn trainfp trainfn testfnpc trainfppc trainfnpc
0 1065 67 21 2.69 0.371 0.106
1 1022 67 21 2.58 0.371 0.106
2 941 67 21 2.38 0.371 0.106
[...]
The last three columns just replicate the previous three in
percentage, so the previous three matter.
testfn refers to the test set while train* refers to the
training set. Now the latter are the numbers -- as you
explain -- which will be used in the next round, so this
must be after tuning, right? Where is the number before that?
>> Also, as I mentioned earlier it is very suspicious that you
>> have for ten rounds exactly the same number of messages to
>> train with.
>
> It would be, except that what I did was scan all those messages but
> train with only the errors, as this sentence from the writeup explains:
>
> The following steps were then iterated: the smaller parts were
> classified using the newly determined parameters, and the messages that
> were classified wrongly or as unsure were used to train the database
> further.
Right, but it is very unlikely that this is the same number
every time.
> Again, the detailed log will tell us how many messages were used in
> each round of training, and I'll put that up with the parameter values.
> Probably get to it on the coming weekend.
Great. Something must have gone wrong.
pi
More information about the Bogofilter
mailing list