training to exhaustion?

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Tue Mar 9 13:32:44 CET 2004


Greg Louis wrote:

>> Also you will
>> need to train with ham. Best is to swith between them.
>> 
> Practical question: If I set up a Tom-Anderson-like experiment (see his
> recent posting in praise of repetitive training), and it works, then
> after a relatively short time there will be precious few wrongly
> classified nonspam among the new test messages.  Almost all the errors
> will be unsure spam. 

I am not sure I fully understand Tom's setup. But I would
expect that to be balanced for ham and spam.

> pi, do you occasionally pad your training db with
> correctly classified nonspam to rebalance the message counts, or do you
> let it get lopsided?

I don't do anything like this. But since correction runs go
over the complete collection there are ham messages which
now enter the security interval (or are classified
incorrectly) and are therefore used in training. So the
balance happens naturally.

pi




More information about the Bogofilter mailing list