Database maintenance with combined wordlist.

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Sat Sep 27 13:27:27 CEST 2003


"Greg McCann" <greg at cambria.com> wrote:

>I do this to keep my wordlists fresh and to keep them from growing too large.  The reason I used different expiration times is that I have far more spam than ham in my wordlists.  I want to keep the wordlists roughly the same size

I only use train on error and have the same number of ham
and spam to train with. For quite some time the number of
messages used has a ratio very close to two spam for one ham
message. This could be dependend on the paramers, but it is
some interesting observation. Most interestingly, I once
changed the security margin for training from symmetric to
assymetric, but the ratio did not change.

pi




More information about the Bogofilter mailing list