repetitive-training experiments

Mon Mar 15 08:13:54 CET 2004

Greg Louis <glouis at dynamicro.on.ca> wrote:

>A new experiment comparing results I get with full training, half full
>and half train-on-error, and train-on-error with three different
>maximum registration limits is written up at
>http://www.bgl.nu/bogofilter/reptrain2.html

"Once people started training on error -- some folks even
skipped the initial stage of building the database to
reasonable size by full training, though that has been shown
to be less efficient"

Other experiments have shown that initial full training is
less efficient ...

"Comparisons in accuracy are only valid when the number of
false positives is held constant."

I have given examples why this can severely mislead. This is
because the message distribution can be pretty funny. So
even small changes in the "false positive targets" let to
very large jumps in false negatives. I have even seen
examples where one test had fewer false positives and false
negatives than another, but after changing values this way,
it looked worse than the other.

But anyhow, I don't see that you do what you say in your
table. I'd assume you have the same number of fp in your
experiment then, but you don't.

When you run bogotune, I get the impression that you use the
messages for tuning which you will later use for testing, is
that correct?

"With these small databases, I thought, there will be lots
of unknown and low-count tokens, which might be inherently
unreliable."

I cannot see that. Train on error will naturally produce
them and make use of them. So this method often does not see
the need to add more of the same tokens. So robs=1 seems to
work against it.

>It seems that, whatever its benefit to other people, training on error
>from scratch (with or without repetition) doesn't work well for me.  

Looks like it. Indeed only one run always produced bad
results for me, too, but not that bad. Already with security
margins, one run of randomtrain outperformed your half
training and full training dramatically:
http://article.gmane.org/gmane.mail.bogofilter.general/5403

Interesting enough, if I repeated with more messages,
randomtrain went down in false positives.

>I could find no way to get the false-positive count low enough without
>allowing far too many false negatives.  

I don't really understand why you want to have so many
unsures in the first place. After all, every unsure is an
error in real life.

>(One thing I didn't try, that
>might have helped, was to use a rather high spam_cutoff and low
>ham_cutoff during training, thus defining "unsure" more broadly than is
>done in production and so using more marginal-scoring messages for
>training.  I intend to try that and add the results to this writeup.)

That would be very useful. Or you could do your testing just
without unsures and a cutoff about .4.

You could also include training to exhaustion in your
training. The main difference of course is that this looks
back at what you do in the beginning while the tests you did
rely on just running through all messages once and doing
possibly unneeded things there which are not balanced out
later.

pi