Use root words to reduce training time

Kevin O'Connor kevin at koconnor.net
Tue May 18 07:53:55 CEST 2004


On Mon, May 17, 2004 at 08:44:42AM -0400, David Relson wrote:
> In token.c there's function get_token().  Modifying that function to
> return "token" and "root:token" shouldn't be too difficult.  

Hrmm.  I had not thought of doing it that way.

I was thinking of using the root word as a way of selecting a better robX
value.  (Ie. put the feature in the statistics code instead of the token
parsing code.)

The advantage of your way is that it is easier to implement and fits in
well with the rest of the code.  A possible disadvantage, however, is that
it could cause root tokens to overly influence the outcome.

Thanks,
-Kevin



More information about the Bogofilter mailing list