Result Based on a Single Token

John G Walker johngeoffreywalker at yahoo.co.uk
Tue Oct 2 19:03:49 CEST 2007



On Tue, 2 Oct 2007 17:35:07 +0100 RW <fbsd06 at mlists.homeunix.com> wrote:

> The reason why this particular mail was detected as spam is that I
> don't train on all unsure result in mailing lists.

That's your problem, then.

> The reason I don't learn all unsure mails in mailing lists is that
> mailing lists are one of the few cases where spammers have access to
> high-quality ham text, and I'm concerned that one day they may
> exploit that. Consequently I don't like to let lists dominate my ham
> corpus. 

If you try to pick and choose which observations go into a Bayesian
(or, indeed, classical statistics) database then you get screwy results.

That's the nature of statistics. You have to throw in everything or it
doesn't work. Period. As you've discovered,

-- 
 All the best,
 John



More information about the Bogofilter mailing list