Method of training

Fri Sep 5 08:44:39 CEST 2003

jxz <jxz at uol.com.br> wrote:

>After, I would classify manually the messages from this temp mbox, and
>send to train.ham.35 and train.spam.35 mboxes, and train bogofilter.
>
>With this method, I should only train bogofilter by it's errors, and
>there will be no need to save the whole cruft of spam, only the unsures,
>in case of db corrupt or db schema updates.

I would not suggest this. Every time you train with new
messages, the rating of all previously seen messages
changes. I described an example where this can happen to the
unexpected direction. So what does that mean for you? If you
save only those messages which have been unsure (or
failures) when they were seen for the first time, you will
lose significant information when you retrain. So my advice
is to keep all those messages.

>Now I ask: the train-on-error method works well? 

It works excellent. And you can even do with fewer messages,
you actually add messages which will at the time of adding
be rated correctly already. See the FAQ for details on the
training methods.

>Or do I need to receive
>hundreds of thousands of trillions of billions of emails to it begin to
>be accurate? :)

No, but several thousends would be useful.

>What do you think of this method, and what method you currently use, and
>is satisfied with it's accuracy?

Absolutely, I estimate my error to less than one message a
day where 150--300 messages are received.

pi