final version of reptrain2

Boris 'pi' Piwinger 3.14 at
Tue Mar 16 13:33:45 CET 2004

Greg Louis wrote:

>> >The report at is, I hope,
>> >now in its final form.  
>> As I see you responded to some concerns I raised.
> That's nice.  I haven't read your earlier response -- scanned it
> quickly and put it aside for later consideration -- so it was
> serendipitous ;)

Even better. Maybe there is more when you come to it.

>> But there are still open questions like the use of bogotune. It looks
>> like it gets a preview on the messages that are later used for
>> counting mistakes.
> Just so.  I wanted the best possible accuracy on the test, so if the
> training db was big enough I tuned with the test message set and then
> ran the test on the two halves with the resulting parameters, knowing
> the general outcome beforehand from the tuning run.

Indeed this is a flaw in the test.

> While this did
> operate to the disadvantage of the train-on-error-from-scratch runs,
> for which no valid bogotune run could be performed, the differences in
> accuracy were greater than could be explained by this factor.

I guess so, just for the sake of the argument it would be
better to not include the run ob bogotune. It is never clear
how much of a diffence this makes.

> Normally one would train with one group of messages, tune with a second
> and classify a third if one wanted to mimic production.  

Yes, but of course that reduces the number of message you
have to work with.

> Those who want to prove the merits of on-error-from-scratch or
> repetitive (to exhaustion or with a limit) training are invited to
> publish full accounts of their experiments, in enough detail that I (or
> anyone) could repeat them on my own message corpora if I wished to do
> so. 

BTDT:-)) Actually, David created a test shell script to do
it. I had several improvements, it should be around and I
used it in tests posted here.


More information about the Bogofilter mailing list