incomplete experiment re repetitive tuning (longish)

Thu Feb 12 15:58:32 CET 2004

Greg Louis wrote:

> The smaller parts were used to create a training database by "full
> training" and the optimal bogofilter parameters were determined using
> bogotune, with the newly created training database and the larger parts
> of the original corpora for testing.  These test messages remained
> "new" in that none of them were ever used for training.

As I said before, this is very far from what train on error
is. So I cannot see that those results are relevant. Why not
simply do that?

Also tuning seems not to fit in here, but maybe we learn
something from it. In any case you severly violate the idea
here which was to fix parameters once and then train to
match those. If you change them later, that leads to
completely unpredictable results.

> The following steps were then iterated: the smaller parts were
> classified using the newly determined parameters, and the messages that
> were classified wrongly or as unsure were used to train the database
> further. 

Probably this happens simultaneously in your experiment,
right? So if you have several messages which are similar you
would add all those, while I typically add only the first.

> (the columns show the number of iterations of training, the
> expected false negatives reported by bogotune, and the numbers of
> nonspam and spam that were wrongly or uncertainly classified and were
> therefore used in the next round of training; the right-hand three
> columns show the same results in percentage form):
> 
>  run testfn trainfp trainfn testfnpc trainfppc trainfnpc
>    0    520       1     160     2.59   0.00993      1.59
>    1    505       1     147     2.51   0.00993      1.46
>    2    481       1     126     2.39   0.00993      1.25
>    3    510       0     127     2.54   0.00000      1.26
>    4    477       1     113     2.37   0.00993      1.12
>    5    649      54     173     3.23   0.53646      1.72
>    6    724       9     174     3.60   0.08941      1.73

It would be important to know what changes happened. What
happened to the parameters? What you have is very strange.
Usually your trainfn should go down quickly. It does not a
single time. That suggests that something is very wrong,
probably the choice of parameters. Why would you choose
paramters to get 54 fp's here?

Also not recorded is your concern about repeatedly training
with the same messages. It is not clear if you observed that.

>  run testfn trainfp trainfn testfnpc trainfppc trainfnpc
>    0   1065      67      21     2.69     0.371     0.106
>    1   1022      67      21     2.58     0.371     0.106
>    2    941      67      21     2.38     0.371     0.106
>    3    678      67      21     1.71     0.371     0.106
>    4    619      67      21     1.56     0.371     0.106
>    5    643      67      21     1.62     0.371     0.106
>    6    589      67      21     1.49     0.371     0.106
>    7    567      67      21     1.43     0.371     0.106
>    8    576      67      21     1.46     0.371     0.106
>    9    521      67      21     1.32     0.371     0.106

That trainfp and trainfn don't change is a sign of total
failure. There should be a change for something. I would
guess that the exact same messages are in question (else a
fixed number would be an even bigger surprise). And at least
some of them should be learned.

So what do we learn? Well, maybe it is just that the
experiment does the wrong thing. Tuning can introduce new
errors.

pi