For users without corpora

David Relson relson at osagesoftware.com
Thu Dec 2 01:22:00 CET 2004


On 01 Dec 2004 19:06:55 -0500
Tom Anderson wrote:

> On Wed, 2004-12-01 at 13:29, Boris 'pi' Piwinger wrote:
> > Tom Anderson said:
> > >> What are your thoughts on using bogofilter with a person who
> > >doesn't> have a collection of spam and ham mails? Is it better to
> > >provide them> with a collection of both for initial training and
> > >then train on error,> or to ask them to wait until they've got
> > >enough spam and ham mails to do> the training?
> > >
> > > Just set reasonable cutoffs, use -u,
> > 
> > That sound really dangerous.
> 
> Not at all.  By "reasonable cutoffs", I mean a high spam cutoff
> (~0.9), a low ham cutoff (~0.1), and a robx somewhere between them
> inside of the min_dev range.  In this configuration, running
> bogofilter with -u will make every single email an unsure.  There can
> be no incorrect classifications yet.  Only when you start to classify
> stuff as either ham or spam will new emails begin receiving a
> non-neutral classification.  And as David said, new users should be
> very careful to watch these classifications for the first few days and
> make corrections right away.  After that, it should be well over 95%
> correct in its classifications with near-zero false positives.  At
> that point, it's time to consider adjusting cutoffs to reduce unsures.
>  This is the
> safest way to do it IMHO.
> 
> Tom

Bogofilter's default settings are quite conservative:

   ham_cutoff=0.45
   spam_cutoff=0.99

With the high spam_cutoff value, only obvious spam will be marked
"Spam".  The ham_cutoff value is less critical as any spam classified as
ham will be noticed in the normal course of reading mail.

The default settings should be usable "out of the box". 

David



More information about the Bogofilter mailing list