Security margins in training (on error and to exhaustion)

David Relson relson at osagesoftware.com
Wed Dec 10 14:27:14 CET 2003


On Wed, 10 Dec 2003 14:03:25 +0100
Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> wrote:

> David Relson wrote:
> 
> >> > To summarize:  a larger margin builds a larger database and gives
> >> > better classification results.
> >> 
> >> At least up to some point. As a KISS answer I'd suggest to
> >> use spam_cutoff+-0.3 as an interval (assuming ham_cutoff =
> >> spam_cutoff, IOW: ham_cutoff=0).
> > 
> > I would _never_ use spam_cutoff=ham_cutoff. 
> > 
> > Using tri-state classification lets my MUA filter all the Unsures
> > into a single folder so I know which messages weren't classifiable. 
> > It beats the heck out of having the messages filtered into the many
> > possible different folders I have.
> 
> OK, here is my tri-state KISS answer;-) Let mid be the
> middle between your two cutoffs. Train with mid+-0.3. Rate
> messages with your original cutoffs (closer to mid).

OK.  Word it as you think best and I'll revise as I think best :-)




More information about the Bogofilter mailing list