bogotune results

Thu Mar 25 04:02:17 CET 2004

David Relson wrote:
> On Wed, 24 Mar 2004 21:20:24 -0500
> Tom Allison wrote:
> 
> 
>>Boris 'pi' Piwinger wrote:
>>
>>>
>>>Right. OTOH you can see any unsure as a failure.
>>>
>>
>>I would think any Unsure to be a failure is a little over expectant.
>>
>>If my assumption of what bogotune does is correct, I would have
>>assumed a ham_cutoff just above the highest scoring ham and a
>>spam_cutoff just below the lowest scoring spam.
> 
> 
> Since a ham message _can_ score higher than a spam message, and vice
> versa, this may not be what you want.  Consider ham_cutoff to be a bit
> less than the lowest scoring spam and the spam_cutoff to be a bit higher
> that the higest scoring ham.  That leaves an unsure range corresponding
> to spam which score lower than ham.

I'm working with the assumption that my archive of spam/ham has already 
been tuned/trained to such an extent that they are all seperated into 
two distinct ranges and that they can be represented successfully with 
distinct ham_cutoff (highest_ham+) and spam_cutoff (lowest_spam-) values.

Obviously anything in the future can cross these parameters, but we're 
trustful that with sufficient number of tokens, the probability of this 
happening is increasingly small.

Now that I have some 2800+ ham tokens and 2000+ spam tokens and ~2500 
each of ham/spam emails, I should hope to avoid, with certainty, the 
chance that a good email will score across the Unsure and all the way 
into the Spam group and similarly with spam doing the same.

I have yet to see ham do that, but spam does it fairly regularly (once a 
week).