New new script to train bogofilter

Peter Bishop pgb at adelard.com
Fri Jul 4 15:10:32 CEST 2003


On 4 Jul 2003 at 7:06, Greg Louis wrote:

> So the problem you may see down the road if you train "to extinction"
> on errors is that bogofilter will get very good at recognizing the
> types of messages on which you train, and rather poor at recognizing
> messages that are similar to, but not strongly similar to, those
> training ones.  It's like trying to recognize dogs by training on
> German shepherds only: a great Dane shares the "dog" characteristics,
> but we'll have learned too many specific "German shepherd" ones so we
> may well misclassify the great Dane.

Hmm

I think what Boris is doing can be put another way.
i.e.
"If you could only put some fixed number of messages e.g. 1000
into the database - which ones would you choose out of a larger population 
of messages?" 

So following your dog analogy, the train of error procedure only puts a 
message into the database if it looks different, i.e. like putting 
different breeds of dog.into a picture book.

So you end up with a picturebokk full of different breeds, rather than a 
picture book whre there is a lot a dogs of the same type while others are 
missed completely.


-- 
Peter Bishop 
pgb at adelard.com
pgb at csr.city.ac.uk






More information about the Bogofilter mailing list