oddity.
David Relson
relson at osagesoftware.com
Tue Apr 15 01:25:30 CEST 2003
At 05:59 PM 4/14/03, michael at optusnet.com.au wrote:
>I understand why the loop is there, but collect_words() ignores its
>'wordhash' input, and allocates a new hash every time it's called.
>It's returned by overwriting the 'wordhash' input.
wordhash is an output, not an input. There is no value in wordhash before
calling collect_words(), hence nothing to lose.
>So the loop above actually ignores everything except the
>words after the last 'From ' and leaks memory for all the
>blocks before that From.
What's there is correct, AFAIK. When you have a counter-example to prove
me wrong, send it.
>It sounds like you really want something like:
>
> wordhash_t * wordhash = NULL;
> long wordcount = 0;
> ...
> collect_reset();
> do {
> wordhash_t * temp_hash;
> long temp_count;
>
> collect_words(&temp_hash, &temp_count, &cont);
> if (!wordhash) {
> wordhash = temp_hash;
> wordcount = temp_count;
> } else { /* Merge the new hash with the existing one */
> wordhash_sort(temp_hash);
> add_hash(wordhash, temp_hash);
> wordhash_free(temp_hash);
> wordcount += temp_count;
> }
> } while(cont);
> ...
>
>yes? (note: written by eye and not compiled; eat with care; do not machine
>wash; if
>swallowed, seek medical advice).
Michael,
I was preparing a long reply to tell you why you were wrong and then
decided to test the actual behavior with a specially crafted message. To
my chagrin, I found that you are correct. Whatever appears before a ^From
line is lost. At least I never sent the message I started to write.
The proper solution may be create the wordhash before the call to
collect_words() and let the function just add to it. Then the higher level
routine is responsible for allocation, management, and deallocation of
wordhashes. That's a better design than allocating at a lower level and
deallocating at the higher level. I need to do some experimentation to
determine exactly what needs to be done and to make sure speed doesn't suffer.
David
More information about the Bogofilter
mailing list