Result Based on a Single Token

RW fbsd06 at mlists.homeunix.com
Tue Oct 2 21:40:33 CEST 2007


On Tue, 2 Oct 2007 19:29:55 +0100
John G Walker <johngeoffreywalker at yahoo.co.uk> wrote:

> 
> 
> On Tue, 2 Oct 2007 18:47:48 +0100 RW <fbsd06 at mlists.homeunix.com>
> wrote:
> 
> > Personally, I believe this is a bug
> 
> Your personal beliefs are, of course, your own business. However,
> bogofilter works beautifully because it is a Bayesian filter, pure and
> simple. If you want a spam filter based on some other technique then
> you shouldn't be using bogofilter. It's that simple.

A pure Bayesian filter would include all tokens. Pragmatically
real-world implementations have to prune most of the tokens to protect
again poisoning, but pruning down to one is just wrong, and should
never happen. 

The fact that I leaned a sub-set of the mailing list ham is a
red-herring. There is no substantial difference between learning a
sub-set of a mailing list and learning a lower-volume list.

The email had a wealth of ham indicators that Bogofilter ignored. If it
had been forced to used more than one token the filter would have
behaved correctly, in spite of Tuffmail's aggressive tuning.




More information about the Bogofilter mailing list