breaking the training db
David Relson
relson at osagesoftware.com
Mon Sep 22 18:06:05 CEST 2003
On Mon, 22 Sep 2003 16:21:57 +0100
"Peter Bishop" <pgb at adelard.com> wrote:
> On 22 Sep 2003 at 9:36, David Relson wrote:
>
> > I've been thinking about the header tagging changes and realize that
> > the effect is wider spread than I initially thought. The changes
> > add"head:" to _all_ header tokens that aren't already tagged with
> > subj:, to:, from:, or rtrn:. The effect is to stop using a whole
> > group of tokens and start using a new and different set.
> > Bogofilter's accuracy may well be lower until sufficient training is
> > done. Drat!!
> >
> Hmm - might be a case for "degeneration", i.e. if you cannot find
> "head:token" , use the count for "token" instead.
>
> Degeneration should prevent a drop in accuracy during the transition
> phase (ditto for other changes in token handling like case sensitive
> tokens).
Peter,
Good thought. My earlier tests showed that degeneration was
detrimental, rather than beneficial. It might be worth turning it on to
help with the new "head:" tokens.
By the way, are you using degeneration? If I recall, you requested it
when the parser was changed some months ago...
David
More information about the Bogofilter
mailing list