training and tuning (was: compile time options)

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Tue Sep 30 18:44:02 CEST 2003


Tom Anderson <tanderso at oac-design.com> wrote:

>> now" values with tiny dbs.  Come to think of it, if we had such a
>> thing, maybe it could even be made part of a classification run -- got
>> a small db, retune it like a harpsichord for every use -- got a big
>> one, treat it like a piano and tune it every 3 months :)
>
>That sounds like a fine idea.  Nobody wants to be manually tuning their
>database as just a regular user.  They (or more likely a server admin)
>just want to set it up and let it work, only sending corrections when
>needed.  If the tuning is done as a part of the
>classification/registration, then it becomes much more user-friendly.

Well, there could be a training script which calls the
tuning script, not much of a difference to now.

But this brings me to a question (still unanswered) I asked
recently: 
:I recently thought about some conflicting approaches.
:
:Train on error (as applied by bogominitrain.pl) makes it
:impossible to use bogotune, which was so happy with the
:output already it would not try to optimizie further. Well,
:I did optimize before switching to that approach.
:
:Now, where is the problem: In train on error as opposed to
:full training the selection of messages to train with is
:highly dependend on the configuration. So this training is
:close to optimal for these settings.
:
:So here is my question? How important are these settings
:after all when train on error is used? Let me give examples:
:
:1) I have a very high min_dev, so if something slips
:through, this is usually due to to few words used in
:calculating bogosity. So this is a factor.
:
:2) I have a cutoff at .501 (result of some tuning). Now I
:could just move it to .4 or .6 say and train with that. It
:would probably not change much. (I add a security margin for
:training as described in previous messages). Yes, it is
:related to values for unseen words, but?

pi




More information about the Bogofilter mailing list