tuning and archives

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Mon Feb 23 14:24:15 CET 2004


Tom Allison wrote:

> I'm worried that if the imbalance gets too great, it will start to drop 
> in accuracy.
> 
> I'm currently configured to '-u' all email.  It appears that this 
> inbalance in the histrogram may give a visual reason why you might not 
> want to do that all the time since it might augment the imbalance.

I am not a friend of full training as you will probably
know, but here is my answer:

I really don't know if it matters that the tokens in the
database are balanced in number. Balancing them in content
is done with training to exhaustion. This would just let
your problem vanish.

If you still want to use -u there is a new paramet
introduced recently by David which uses something similar to
the concept of security margins. By this you only train with
messages which are not "obviously ham/spam". This might
serve you. You'll find it in the example config file.

pi




More information about the Bogofilter mailing list