For users without corpora
Boris 'pi' Piwinger
3.14 at piology.org
Thu Dec 2 08:42:40 CET 2004
David Relson said:
>> > Just set reasonable cutoffs, use -u,
>>
>> That sound really dangerous.
>>
>> > and start training on error from scratch.
>>
>> Yep. A one day collection should suffice.
>
> When I started filtering email, I was very, very careful -- almost
> paranoid in my setup. Any new user who doesn't carefully monitor the
> output of _any_ new filter is naive (to be kind).
Right, the only question is how often someone will check his mail.
> Either '-u' or 1 day's mail is plenty to get started. Careful
> monitoring of the results is important until the tool is seen to be
> operating properly. Sane use of '-u' calls for continued checking of
> results and using any misclassifications for training, i.e. train on
> error.
My concern with early use of -u is that in a few hours errors can multiply
and hence create a need for a huge correction session. This won't happen
with a sufficiently stable database.
> It's also recommended that _all_ unsures be manually classified
> and used for training.
I would just avoid this work, use 2-state and just let it happen when
doing correction runs with security margins.
pi
More information about the Bogofilter
mailing list