mailing lists and hapaxes

Peter Bishop pgb at adelard.com
Thu Sep 25 09:46:30 CEST 2003


On 25 Sep 2003 at 8:46, Boris 'pi' Piwinger wrote:

> >My thinking here is that randomly deleting hapaxes is dangerous, because
> >you don't know if they're about to turn into real tokens. But if
> >they've remained an hapax for a month, it's pretty unlikely you'll see
> >another one of them, so you can fairly safely kill it.
> 
> So if you don't train with this token, because it was good
> enough, this would get the token removed. Not so good.
> 
Yes indeed,
Maintenance based on date or count assumes that
*all* messages will be added to the database.

For those of us who build minimal databases
e.g. via train-on-error, bogominitrain or whatever)
this is not the case .

An old, singleton token could be doing a fine job 
- but there is no easy way of finding out


-- 
Peter Bishop 
pgb at adelard.com
pgb at csr.city.ac.uk






More information about the Bogofilter mailing list