Recent changes to token.c

Sun Mar 13 17:56:55 CET 2005

Matthias,

FYI, when I was looking at bogofilter's use of heap storage (malloc and
free), I noticed an interesting pattern involving creation and saving
of tokens.  

Typically, collect_words() calls get_token() which calls the lexer.
The lexer parses a tokens which get_token() puts into a word_t, and
collect_words() saves the token text in its wordhash.  For storage
efficiency, wordhashes preallocates storage for a bunch of nodes and
strings.  When the wordhash needs to store the token, it copies the
token's info into the preallocated storage.  The word_t can then be
freed.

Since the use of word_t is temporary and tokens have maximum lengths
(MAXTOKEN), there's no need to dynamically allocate/free word_t -- a
statically defined one works just fine.  This also saves all the malloc/
free calls, hence is a bit faster.

Message ID's are an exception to bogofilter's MAXTOKENLEN constant.  As
they can be much longer, they get special treatment.

HTH,

David

_______________________________________________
Bogofilter-dev mailing list
Bogofilter-dev at bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter-dev