Training from scratch.
Tom Eastman
tom at celleste.no-ip.org
Thu Jul 15 15:35:04 CEST 2004
On Friday 16 July 2004 01:21, Tom Anderson wrote:
> I would set the min_dev to a relatively high value when your database is
> small. This way, more email is properly classified as unsure (since
> bogofilter really is unsure at this point on most things) and not
> misclassified. Only after you see certain tokens multiple times will they
> start to effect scoring. Otherwise it would be quite possible to
> misclassify emails due to common words only showing up in spam at first,
> and then you get a false positive when they show up in a ham. With a
> higher min_dev, it should be a relatively smooth transition from mostly
> unsures to mostly correct classifications, without ever having lots of
> misclassifications.
That's really surprising, I thought a high min_dev would have the opposite
effect -- that scores would be more likely to be close to 0.0 or 1.0.
I had intuited that a low min_dev would mean that there was more neautral-ish
tokens that would push the score towards 0.5.
Am I just confusing myself?
Tom
More information about the Bogofilter
mailing list