Use root words to reduce training time
David Relson
relson at osagesoftware.com
Tue May 18 13:09:15 CEST 2004
On Tue, 18 May 2004 01:53:55 -0400
Kevin O'Connor wrote:
> On Mon, May 17, 2004 at 08:44:42AM -0400, David Relson wrote:
> > In token.c there's function get_token(). Modifying that function to
> > return "token" and "root:token" shouldn't be too difficult.
>
> Hrmm. I had not thought of doing it that way.
>
> I was thinking of using the root word as a way of selecting a better
> robX value. (Ie. put the feature in the statistics code instead of
> the token parsing code.)
What do you mean?
> The advantage of your way is that it is easier to implement and fits
> in well with the rest of the code. A possible disadvantage, however,
> is that it could cause root tokens to overly influence the outcome.
Implementation and fit are commonly two of my goals :-) "overly
influence" indicates I don't understand your idea. Can you explain more
fully?
Cheers!
David
More information about the Bogofilter
mailing list