For users without corpora

David Relson relson at osagesoftware.com
Thu Dec 2 00:01:19 CET 2004


On Wed, 1 Dec 2004 19:29:42 +0100 (CET)
Boris 'pi' Piwinger wrote:

> Tom Anderson said:
> >> What are your thoughts on using bogofilter with a person who
> >doesn't> have a collection of spam and ham mails? Is it better to
> >provide them> with a collection of both for initial training and then
> >train on error,> or to ask them to wait until they've got enough spam
> >and ham mails to do> the training?
> >
> > Just set reasonable cutoffs, use -u,
> 
> That sound really dangerous.
> 
> > and start training on error from scratch.
> 
> Yep. A one day collection should suffice.
> 
> pi

When I started filtering email, I was very, very careful -- almost
paranoid in my setup.  Any new user who doesn't carefully monitor the
output of _any_ new filter is naive (to be kind).

Either '-u' or 1 day's mail is plenty to get started.  Careful
monitoring of the results is important until the tool is seen to be
operating properly.  Sane use of '-u' calls for continued checking of
results and using any misclassifications for training, i.e. train on
error.  

It's also recommended that _all_ unsures be manually classified
and used for training.  The unsures are, perhaps, the most useful
messages for training as they're notably different from the messages
previously used for training.  Were the unsures similar, they'd have
been classified as ham or spam.

David



More information about the Bogofilter mailing list