better Bayesian bogofilter

Wed Aug 13 08:44:23 CEST 2003

On 12 Aug 2003 at 16:58, Matthias Andree wrote:

> I'd think we should go for the code that doesn't care about the ratio of
> spam to ham used in training. We'd better avoid optimizations that
> depend on the environment or makes assumptions about the user.

I agree.

As I don't use the -u option, the spam/ham ratio in my database 
is not the same as my incoming message spam/ham ratio

I use a combinaton of train-on-error and a "honeytrap" spam
feed to update my database, and currently the ratios are:

database (spam:ham)	3:1
message (spam:ham)	1:1

But different ratios can occur with any selective training regime
(randomtrain, bogominitrain, train-on-error, honeytrap spam feeds, etc)

-- 
Peter Bishop 
pgb at adelard.com
pgb at csr.city.ac.uk