final version of reptrain2

Greg Louis glouis at dynamicro.on.ca
Tue Mar 16 13:28:01 CET 2004


On 20040316 (Tue) at 0852:00 +0100, Boris 'pi' Piwinger wrote:
> Greg Louis <glouis at dynamicro.on.ca> wrote:
> 
> >The report at http://www.bgl.nu/bogofilter/reptrain2.html is, I hope,
> >now in its final form.  
> 
> As I see you responded to some concerns I raised.

That's nice.  I haven't read your earlier response -- scanned it
quickly and put it aside for later consideration -- so it was
serendipitous ;)

> But there are still open questions like the use of bogotune. It looks
> like it gets a preview on the messages that are later used for
> counting mistakes.

Just so.  I wanted the best possible accuracy on the test, so if the
training db was big enough I tuned with the test message set and then
ran the test on the two halves with the resulting parameters, knowing
the general outcome beforehand from the tuning run.  While this did
operate to the disadvantage of the train-on-error-from-scratch runs,
for which no valid bogotune run could be performed, the differences in
accuracy were greater than could be explained by this factor.

Normally one would train with one group of messages, tune with a second
and classify a third if one wanted to mimic production.  I see no
necessity to mimic production for the purpose of this comparison, and
it made better sense to me to get more messages into the test runs.

Anyway, whether you accept or reject my experimental design, it is not
worthwhile for us to discuss it, given our disagreement on fundamental
principles (which we have argued to exhaustion on earlier occasions;
bis xhrambe thanatos).
 
Those who want to prove the merits of on-error-from-scratch or
repetitive (to exhaustion or with a limit) training are invited to
publish full accounts of their experiments, in enough detail that I (or
anyone) could repeat them on my own message corpora if I wished to do
so.  I can make no promise to do any such thing, though, even if I deem
the experimental designs correct.

Those who don't care about proof, and are getting results they like
with whatever they're doing, are doing the right thing :)

-- 
| G r e g  L o u i s         | gpg public key: 0x400B1AA86D9E3E64 |
|  http://www.bgl.nu/~glouis |   (on my website or any keyserver) |
|  http://wecanstopspam.org in signatures helps fight junk email. |




More information about the Bogofilter mailing list