For users without corpora

Boris 'pi' Piwinger 3.14 at piology.org
Thu Dec 2 08:42:40 CET 2004


David Relson said:

>> > Just set reasonable cutoffs, use -u,
>>
>> That sound really dangerous.
>>
>> > and start training on error from scratch.
>>
>> Yep. A one day collection should suffice.
>
> When I started filtering email, I was very, very careful -- almost
> paranoid in my setup.  Any new user who doesn't carefully monitor the
> output of _any_ new filter is naive (to be kind).

Right, the only question is how often someone will check his mail.

> Either '-u' or 1 day's mail is plenty to get started.  Careful
> monitoring of the results is important until the tool is seen to be
> operating properly.  Sane use of '-u' calls for continued checking of
> results and using any misclassifications for training, i.e. train on
> error.

My concern with early use of -u is that in a few hours errors can multiply
and hence create a need for a huge correction session. This won't happen
with a sufficiently stable database.

> It's also recommended that _all_ unsures be manually classified
> and used for training.

I would just avoid this work, use 2-state and just let it happen when
doing correction runs with security margins.

pi



More information about the Bogofilter mailing list