Training from scratch.

Tom Eastman tom at
Thu Jul 15 15:35:04 CEST 2004

On Friday 16 July 2004 01:21, Tom Anderson wrote:
> I would set the min_dev to a relatively high value when your database is
> small.  This way, more email is properly classified as unsure (since
> bogofilter really is unsure at this point on most things) and not
> misclassified.  Only after you see certain tokens multiple times will they
> start to effect scoring.  Otherwise it would be quite possible to
> misclassify emails due to common words only showing up in spam at first,
> and then you get a false positive when they show up in a ham.  With a
> higher min_dev, it should be a relatively smooth transition from mostly
> unsures to mostly correct classifications, without ever having lots of
> misclassifications.

That's really surprising, I thought a high min_dev would have the opposite 
effect -- that scores would be more likely to be close to 0.0 or 1.0.  

I had intuited that a low min_dev would mean that there was more neautral-ish 
tokens that would push the score towards 0.5.  

Am I just confusing myself?


More information about the Bogofilter mailing list