Result Based on a Single Token
David Relson
relson at osagesoftware.com
Tue Oct 2 22:21:56 CEST 2007
On Tue, 2 Oct 2007 18:47:48 +0100
RW wrote:
...[snip]...
> Personally, I believe this is a bug, and the details of how I
> triggered it are immaterial. Software should be tolerant of misuse,
> and should fail-safe.
>
> > That's the nature of statistics. You have to throw in everything or
> > it doesn't work. Period.
> >
>
> And a Bayesian spam filter should not be capable of designating an
> email as spam based on a single token. Period.
Point taken. It can be argued that a single token is insufficient for
classification. Perhaps a message with only 1 significant token
should be rated as "Unsure".
If 1 token is insufficient, what is the appropriate minimum number of
tokens? Is it 2? or 5? or 10? Assuming there is a minimum, it ought to
be configurable (as are bogofilter's other parameters). Given
configurability, there's no way to prevent a bad configuration.
HTH,
David
More information about the Bogofilter
mailing list