Result Based on a Single Token

Tue Oct 2 22:21:56 CEST 2007

On Tue, 2 Oct 2007 18:47:48 +0100
RW wrote:

...[snip]...

> Personally, I believe this is a bug, and the details of how I
> triggered it are immaterial.  Software should be tolerant of misuse,
> and should fail-safe.
> 
> > That's the nature of statistics. You have to throw in everything or
> > it doesn't work. Period.
> > 
> 
> And a Bayesian spam filter should not be capable of designating an
> email as spam based on a single token. Period.

Point taken.  It can be argued that a single token is insufficient for
classification.  Perhaps a message with only 1 significant token
should be rated as "Unsure".

If 1 token is insufficient, what is the appropriate minimum number of
tokens? Is it 2? or 5? or 10?  Assuming there is a minimum, it ought to
be configurable (as are bogofilter's other parameters).  Given
configurability, there's no way to prevent a bad configuration.

HTH,

David