what happens if I discard tokens that occur only once?

Chris Fortune cfortune at telus.net
Sat Jun 4 00:51:01 CEST 2005


> = 1) decay over time. That is: how often do single count tokens become
> registered at least one more time? But this says nothing about how often
> the token is being read. It may have been registered only once but still
> be providing useful information in the calculation.

It would be good to know the last time a token was read, that way "useless" tokens could be timed out automatically.  There is also
the theoretical possibility of optimizing the database to query 'most read' tokens first.  Of course the processing/disc overhead
associated with this type of house-keeping would have to be weighed against the benefits.





More information about the Bogofilter mailing list