Recent changes to token.c

David Relson relson at osagesoftware.com
Sun Mar 13 22:43:30 CET 2005


On Sun, 13 Mar 2005 22:19:15 +0100
Matthias Andree wrote:

> David Relson <relson at osagesoftware.com> writes:
> 
> > Typically, collect_words() calls get_token() which calls the lexer.
> > The lexer parses a tokens which get_token() puts into a word_t, and
> > collect_words() saves the token text in its wordhash.  For storage
> > efficiency, wordhashes preallocates storage for a bunch of nodes and
> > strings.  When the wordhash needs to store the token, it copies the
> > token's info into the preallocated storage.  The word_t can then be
> > freed.
> >
> > Since the use of word_t is temporary and tokens have maximum lengths
> > (MAXTOKEN), there's no need to dynamically allocate/free word_t -- a
> > statically defined one works just fine.  This also saves all the malloc/
> > free calls, hence is a bit faster.
> 
> OK. Does this impair the general usefulness of word_t outside this
> context, aside from message identifiers?

I don't think this has much effect on their usefulness.  The need for
allocating/deallocating them has been lessened, though "grep word_t
*.c" finds 112 places where they're used.

_______________________________________________
Bogofilter-dev mailing list
Bogofilter-dev at bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter-dev



More information about the bogofilter-dev mailing list