Recent changes to token.c
relson at osagesoftware.com
Sun Mar 13 11:56:55 EST 2005
FYI, when I was looking at bogofilter's use of heap storage (malloc and
free), I noticed an interesting pattern involving creation and saving
Typically, collect_words() calls get_token() which calls the lexer.
The lexer parses a tokens which get_token() puts into a word_t, and
collect_words() saves the token text in its wordhash. For storage
efficiency, wordhashes preallocates storage for a bunch of nodes and
strings. When the wordhash needs to store the token, it copies the
token's info into the preallocated storage. The word_t can then be
Since the use of word_t is temporary and tokens have maximum lengths
(MAXTOKEN), there's no need to dynamically allocate/free word_t -- a
statically defined one works just fine. This also saves all the malloc/
free calls, hence is a bit faster.
Message ID's are an exception to bogofilter's MAXTOKENLEN constant. As
they can be much longer, they get special treatment.
Bogofilter-dev mailing list
Bogofilter-dev at bogofilter.org
More information about the Bogofilter-dev