breaking the training db

Peter Bishop pgb at adelard.com
Mon Sep 22 17:21:57 CEST 2003


On 22 Sep 2003 at 9:36, David Relson wrote:

> I've been thinking about the header tagging changes and realize that the
> effect is wider spread than I initially thought.  The changes add
> "head:" to _all_ header tokens that aren't already tagged with subj:,
> to:, from:, or rtrn:.  The effect is to stop using a whole group of
> tokens and start using a new and different set.  Bogofilter's accuracy
> may well be lower until sufficient training is done.  Drat!!
> 
Hmm - might be a case for "degeneration", i.e. if you cannot find
"head:token" , use the count for "token" instead.

Degeneration should prevent a drop in accuracy during the transition 
phase (ditto for other changes in token handling like case sensitive 
tokens).

-- 
Peter Bishop 
pgb at adelard.com
pgb at csr.city.ac.uk






More information about the Bogofilter mailing list