Result Based on a Single Token
John G Walker
johngeoffreywalker at yahoo.co.uk
Tue Oct 2 19:03:49 CEST 2007
On Tue, 2 Oct 2007 17:35:07 +0100 RW <fbsd06 at mlists.homeunix.com> wrote:
> The reason why this particular mail was detected as spam is that I
> don't train on all unsure result in mailing lists.
That's your problem, then.
> The reason I don't learn all unsure mails in mailing lists is that
> mailing lists are one of the few cases where spammers have access to
> high-quality ham text, and I'm concerned that one day they may
> exploit that. Consequently I don't like to let lists dominate my ham
> corpus.
If you try to pick and choose which observations go into a Bayesian
(or, indeed, classical statistics) database then you get screwy results.
That's the nature of statistics. You have to throw in everything or it
doesn't work. Period. As you've discovered,
--
All the best,
John
More information about the Bogofilter
mailing list