better Bayesian bogofilter
Peter Bishop
pgb at adelard.com
Wed Aug 13 08:44:23 CEST 2003
On 12 Aug 2003 at 16:58, Matthias Andree wrote:
> I'd think we should go for the code that doesn't care about the ratio of
> spam to ham used in training. We'd better avoid optimizations that
> depend on the environment or makes assumptions about the user.
I agree.
As I don't use the -u option, the spam/ham ratio in my database
is not the same as my incoming message spam/ham ratio
I use a combinaton of train-on-error and a "honeytrap" spam
feed to update my database, and currently the ratios are:
database (spam:ham) 3:1
message (spam:ham) 1:1
But different ratios can occur with any selective training regime
(randomtrain, bogominitrain, train-on-error, honeytrap spam feeds, etc)
--
Peter Bishop
pgb at adelard.com
pgb at csr.city.ac.uk
More information about the Bogofilter
mailing list