breaking the training db
Peter Bishop
pgb at adelard.com
Mon Sep 22 17:21:57 CEST 2003
On 22 Sep 2003 at 9:36, David Relson wrote:
> I've been thinking about the header tagging changes and realize that the
> effect is wider spread than I initially thought. The changes add
> "head:" to _all_ header tokens that aren't already tagged with subj:,
> to:, from:, or rtrn:. The effect is to stop using a whole group of
> tokens and start using a new and different set. Bogofilter's accuracy
> may well be lower until sufficient training is done. Drat!!
>
Hmm - might be a case for "degeneration", i.e. if you cannot find
"head:token" , use the count for "token" instead.
Degeneration should prevent a drop in accuracy during the transition
phase (ditto for other changes in token handling like case sensitive
tokens).
--
Peter Bishop
pgb at adelard.com
pgb at csr.city.ac.uk
More information about the Bogofilter
mailing list