Recent changes to token.c
David Relson
relson at osagesoftware.com
Sun Mar 13 17:56:55 CET 2005
Matthias,
FYI, when I was looking at bogofilter's use of heap storage (malloc and
free), I noticed an interesting pattern involving creation and saving
of tokens.
Typically, collect_words() calls get_token() which calls the lexer.
The lexer parses a tokens which get_token() puts into a word_t, and
collect_words() saves the token text in its wordhash. For storage
efficiency, wordhashes preallocates storage for a bunch of nodes and
strings. When the wordhash needs to store the token, it copies the
token's info into the preallocated storage. The word_t can then be
freed.
Since the use of word_t is temporary and tokens have maximum lengths
(MAXTOKEN), there's no need to dynamically allocate/free word_t -- a
statically defined one works just fine. This also saves all the malloc/
free calls, hence is a bit faster.
Message ID's are an exception to bogofilter's MAXTOKENLEN constant. As
they can be much longer, they get special treatment.
HTH,
David
_______________________________________________
Bogofilter-dev mailing list
Bogofilter-dev at bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter-dev
More information about the bogofilter-dev
mailing list