Wordlist Histogram [was: What did I do wrong? ]

Tom Allison tallison at tacocat.net
Fri Feb 20 01:07:48 CET 2004


David Relson wrote:
> On Thu, 19 Feb 2004 14:51:21 +0100
> Boris 'pi' Piwinger wrote:
> 
> 
>>David Relson wrote:
>>
>>[bogoutil -H]
>>
>>>hapaxes:  ham  375505 (29.72%), spam  443797 (35.12%)
>>>   pure:  ham  562881 (44.55%), spam  616022 (48.75%)
>>
>>What is the meaning of pure? Tokens which have been seen
>>only once for one category, but possibly many times in the
>>other?
> 
> 
> hapaxes have a total ham+spam count of 1.  "pure" indicates either ham
> or spam is 0.  Given this, all hapaxes are "pure".  I'm open to
> suggestions for better labels :-)
> 

hetero and homo prefixes to something to indicate a mixed (spam and ham 
presence) and singular or pure presence.

I'm curious as to what these values indicate.
How would I interpret this correctly?





More information about the Bogofilter mailing list