For users without corpora

Tom Anderson tanderso at oac-design.com
Thu Dec 2 01:06:55 CET 2004


On Wed, 2004-12-01 at 13:29, Boris 'pi' Piwinger wrote:
> Tom Anderson said:
> >> What are your thoughts on using bogofilter with a person who doesn't
> >> have a collection of spam and ham mails? Is it better to provide them
> >> with a collection of both for initial training and then train on error,
> >> or to ask them to wait until they've got enough spam and ham mails to do
> >> the training?
> >
> > Just set reasonable cutoffs, use -u,
> 
> That sound really dangerous.

Not at all.  By "reasonable cutoffs", I mean a high spam cutoff (~0.9),
a low ham cutoff (~0.1), and a robx somewhere between them inside of the
min_dev range.  In this configuration, running bogofilter with -u will
make every single email an unsure.  There can be no incorrect
classifications yet.  Only when you start to classify stuff as either
ham or spam will new emails begin receiving a non-neutral
classification.  And as David said, new users should be very careful to
watch these classifications for the first few days and make corrections
right away.  After that, it should be well over 95% correct in its
classifications with near-zero false positives.  At that point, it's
time to consider adjusting cutoffs to reduce unsures.  This is the
safest way to do it IMHO.

Tom





More information about the Bogofilter mailing list