Use root words to reduce training time

David Relson relson at osagesoftware.com
Tue May 18 13:09:15 CEST 2004


On Tue, 18 May 2004 01:53:55 -0400
Kevin O'Connor wrote:

> On Mon, May 17, 2004 at 08:44:42AM -0400, David Relson wrote:
> > In token.c there's function get_token().  Modifying that function to
> > return "token" and "root:token" shouldn't be too difficult.  
> 
> Hrmm.  I had not thought of doing it that way.
> 
> I was thinking of using the root word as a way of selecting a better
> robX value.  (Ie. put the feature in the statistics code instead of
> the token parsing code.)

What do you mean?

> The advantage of your way is that it is easier to implement and fits
> in well with the rest of the code.  A possible disadvantage, however,
> is that it could cause root tokens to overly influence the outcome.

Implementation and fit are commonly two of my goals :-)  "overly
influence" indicates I don't understand your idea.  Can you explain more
fully?

Cheers!

David



More information about the Bogofilter mailing list