bogotune results
Tom Allison
tallison at tacocat.net
Thu Mar 25 04:02:17 CET 2004
David Relson wrote:
> On Wed, 24 Mar 2004 21:20:24 -0500
> Tom Allison wrote:
>
>
>>Boris 'pi' Piwinger wrote:
>>
>>>
>>>Right. OTOH you can see any unsure as a failure.
>>>
>>
>>I would think any Unsure to be a failure is a little over expectant.
>>
>>If my assumption of what bogotune does is correct, I would have
>>assumed a ham_cutoff just above the highest scoring ham and a
>>spam_cutoff just below the lowest scoring spam.
>
>
> Since a ham message _can_ score higher than a spam message, and vice
> versa, this may not be what you want. Consider ham_cutoff to be a bit
> less than the lowest scoring spam and the spam_cutoff to be a bit higher
> that the higest scoring ham. That leaves an unsure range corresponding
> to spam which score lower than ham.
I'm working with the assumption that my archive of spam/ham has already
been tuned/trained to such an extent that they are all seperated into
two distinct ranges and that they can be represented successfully with
distinct ham_cutoff (highest_ham+) and spam_cutoff (lowest_spam-) values.
Obviously anything in the future can cross these parameters, but we're
trustful that with sufficient number of tokens, the probability of this
happening is increasingly small.
Now that I have some 2800+ ham tokens and 2000+ spam tokens and ~2500
each of ham/spam emails, I should hope to avoid, with certainty, the
chance that a good email will score across the Unsure and all the way
into the Spam group and similarly with spam doing the same.
I have yet to see ham do that, but spam does it fairly regularly (once a
week).
More information about the Bogofilter
mailing list