Wordlist Histogram [was: What did I do wrong? ]
David Relson
relson at osagesoftware.com
Thu Feb 19 22:54:07 CET 2004
On Thu, 19 Feb 2004 14:51:21 +0100
Boris 'pi' Piwinger wrote:
> David Relson wrote:
>
> [bogoutil -H]
> > hapaxes: ham 375505 (29.72%), spam 443797 (35.12%)
> > pure: ham 562881 (44.55%), spam 616022 (48.75%)
>
> What is the meaning of pure? Tokens which have been seen
> only once for one category, but possibly many times in the
> other?
hapaxes have a total ham+spam count of 1. "pure" indicates either ham
or spam is 0. Given this, all hapaxes are "pure". I'm open to
suggestions for better labels :-)
> BTW: The option is not in the man page.
Another detail to take care of :-<
More information about the Bogofilter
mailing list