repetitive training

Tue Mar 9 14:54:58 CET 2004

On 20040309 (Tue) at 1344:40 +0100, Boris 'pi' Piwinger wrote:

> > already; for iteration n, it's reported on line n-1 of the table of
> > error rates.
> 
> I don't understand that. I thought that are the numbers
> before the next training.

Same thing: the output of the previous run is the error rate before the
next run.

> Also, as I mentioned earlier it is very suspicious that you
> have for ten rounds exactly the same number of messages to
> train with.

It would be, except that what I did was scan all those messages but
train with only the errors, as this sentence from the writeup explains:

The following steps were then iterated: the smaller parts were
classified using the newly determined parameters, and the messages that
were classified wrongly or as unsure were used to train the database
further.

There's an error in the runex script on the website (I'll fix that, of
course).  Where it says
    # train on wrongly classified messages
    bogofilter -d db -s <$src
that should be
    bogofilter -d db -s <sptrain
    bogofilter -d db -n <nstrain
which is what the real runex script actually had in it.  What's on the
site couldn't have run, as variable src is nowhere defined.

Again, the detailed log will tell us how many messages were used in
each round of training, and I'll put that up with the parameter values.
Probably get to it on the coming weekend.

-- 
| G r e g  L o u i s         | gpg public key: 0x400B1AA86D9E3E64 |
|  http://www.bgl.nu/~glouis |   (on my website or any keyserver) |
|  http://wecanstopspam.org in signatures helps fight junk email. |