Result Based on a Single Token

Tue Oct 2 21:40:33 CEST 2007

On Tue, 2 Oct 2007 19:29:55 +0100
John G Walker <johngeoffreywalker at yahoo.co.uk> wrote:

> 
> 
> On Tue, 2 Oct 2007 18:47:48 +0100 RW <fbsd06 at mlists.homeunix.com>
> wrote:
> 
> > Personally, I believe this is a bug
> 
> Your personal beliefs are, of course, your own business. However,
> bogofilter works beautifully because it is a Bayesian filter, pure and
> simple. If you want a spam filter based on some other technique then
> you shouldn't be using bogofilter. It's that simple.

A pure Bayesian filter would include all tokens. Pragmatically
real-world implementations have to prune most of the tokens to protect
again poisoning, but pruning down to one is just wrong, and should
never happen. 

The fact that I leaned a sub-set of the mailing list ham is a
red-herring. There is no substantial difference between learning a
sub-set of a mailing list and learning a lower-volume list.

The email had a wealth of ham indicators that Bogofilter ignored. If it
had been forced to used more than one token the filter would have
behaved correctly, in spite of Tuffmail's aggressive tuning.