Wordlist Histogram [was: What did I do wrong? ]
Tom Allison
tallison at tacocat.net
Fri Feb 20 01:07:48 CET 2004
David Relson wrote:
> On Thu, 19 Feb 2004 14:51:21 +0100
> Boris 'pi' Piwinger wrote:
>
>
>>David Relson wrote:
>>
>>[bogoutil -H]
>>
>>>hapaxes: ham 375505 (29.72%), spam 443797 (35.12%)
>>> pure: ham 562881 (44.55%), spam 616022 (48.75%)
>>
>>What is the meaning of pure? Tokens which have been seen
>>only once for one category, but possibly many times in the
>>other?
>
>
> hapaxes have a total ham+spam count of 1. "pure" indicates either ham
> or spam is 0. Given this, all hapaxes are "pure". I'm open to
> suggestions for better labels :-)
>
hetero and homo prefixes to something to indicate a mixed (spam and ham
presence) and singular or pure presence.
I'm curious as to what these values indicate.
How would I interpret this correctly?
More information about the Bogofilter
mailing list