Database maintenance with combined wordlist.
Boris 'pi' Piwinger
3.14 at logic.univie.ac.at
Sat Sep 27 13:27:27 CEST 2003
"Greg McCann" <greg at cambria.com> wrote:
>I do this to keep my wordlists fresh and to keep them from growing too large. The reason I used different expiration times is that I have far more spam than ham in my wordlists. I want to keep the wordlists roughly the same size
I only use train on error and have the same number of ham
and spam to train with. For quite some time the number of
messages used has a ratio very close to two spam for one ham
message. This could be dependend on the paramers, but it is
some interesting observation. Most interestingly, I once
changed the security margin for training from symmetric to
assymetric, but the ratio did not change.
pi
More information about the Bogofilter
mailing list