breaking the training db

David Relson relson at osagesoftware.com
Mon Sep 22 18:06:05 CEST 2003


On Mon, 22 Sep 2003 16:21:57 +0100
"Peter Bishop" <pgb at adelard.com> wrote:

> On 22 Sep 2003 at 9:36, David Relson wrote:
> 
> > I've been thinking about the header tagging changes and realize that
> > the effect is wider spread than I initially thought.  The changes
> > add"head:" to _all_ header tokens that aren't already tagged with
> > subj:, to:, from:, or rtrn:.  The effect is to stop using a whole
> > group of tokens and start using a new and different set. 
> > Bogofilter's accuracy may well be lower until sufficient training is
> > done.  Drat!!
> > 
> Hmm - might be a case for "degeneration", i.e. if you cannot find
> "head:token" , use the count for "token" instead.
> 
> Degeneration should prevent a drop in accuracy during the transition 
> phase (ditto for other changes in token handling like case sensitive 
> tokens).

Peter,

Good thought.  My earlier tests showed that degeneration was
detrimental, rather than beneficial.  It might be worth turning it on to
help with the new "head:" tokens.

By the way, are you using degeneration?  If I recall, you requested it
when the parser was changed some months ago...

David




More information about the Bogofilter mailing list