New new script to train bogofilter
Peter Bishop
pgb at adelard.com
Fri Jul 4 15:10:32 CEST 2003
On 4 Jul 2003 at 7:06, Greg Louis wrote:
> So the problem you may see down the road if you train "to extinction"
> on errors is that bogofilter will get very good at recognizing the
> types of messages on which you train, and rather poor at recognizing
> messages that are similar to, but not strongly similar to, those
> training ones. It's like trying to recognize dogs by training on
> German shepherds only: a great Dane shares the "dog" characteristics,
> but we'll have learned too many specific "German shepherd" ones so we
> may well misclassify the great Dane.
Hmm
I think what Boris is doing can be put another way.
i.e.
"If you could only put some fixed number of messages e.g. 1000
into the database - which ones would you choose out of a larger population
of messages?"
So following your dog analogy, the train of error procedure only puts a
message into the database if it looks different, i.e. like putting
different breeds of dog.into a picture book.
So you end up with a picturebokk full of different breeds, rather than a
picture book whre there is a lot a dogs of the same type while others are
missed completely.
--
Peter Bishop
pgb at adelard.com
pgb at csr.city.ac.uk
More information about the Bogofilter
mailing list