what happens if I discard tokens that occur only once?

R Kimber rkimber at ntlworld.com
Fri Jun 3 23:56:36 CEST 2005


On Fri, 3 Jun 2005 17:47:04 -0400
David Relson <relson at osagesoftware.com> wrote:


> Hapax importance depends (in part) on how registration is handled.  If
> _every_ message goes into the wordlist and there's a hapax that is
> (say) timestamped 18 months ago then you _know_ the token hasn't
> appeared more recently.  (If it had appeared more recently, the
> timestamp would be more recent.)  In that case, one knows well the
> meaning of discarding count==1 and date<today-18m.  If only some
> messages get registered, then one has no additional info about the
> hapax.

Presumably you only know it hasn't appeared more recently if you don't
use the thresh_update paramenter ? Or does the parameter cause the
timestamp to be updated but nothing to be added?

- Richard.
-- 
Richard Kimber
http://www.psr.keele.ac.uk/



More information about the Bogofilter mailing list