oddity.

michael at optusnet.com.au michael at optusnet.com.au
Mon Apr 14 23:59:21 CEST 2003


David Relson <relson at osagesoftware.com> writes:
> At 01:12 AM 4/14/03, michael at optusnet.com.au wrote:
> 
> >In bogofilter.c:
> >     /* tokenize input text and save words in a wordhash. */
> >     do {
> >         collect_words(&wordhash, &wordcount, &cont);
> >         ++msgcount;
> >     } while(cont);
> >
> >Shouldn't that be free'ing wordhash somewhere before it overwrites it?
> 
> bogofilter() is the message classification function. 
[...]
> When classifying messages, the message body can contain lines starting
> with "^From ".  In a mailbox those message body lines are escaped
> (typically as ">From ").  Without the loop, such a line would
> terminate parsing the message.
> 
> Summary:  it's correct.

I understand why the loop is there, but collect_words() ignores its
'wordhash' input, and allocates a new hash every time it's called.
It's returned by overwriting the 'wordhash' input.

So the loop above actually ignores everything except the
words after the last 'From ' and leaks memory for all the
blocks before that From.

It sounds like you really want something like:

        wordhash_t * wordhash = NULL;
        long wordcount = 0;
        ...
        collect_reset();
        do {
                wordhash_t * temp_hash;
                long temp_count;

                collect_words(&temp_hash, &temp_count, &cont);
                if (!wordhash) {
                        wordhash = temp_hash;
                        wordcount = temp_count;
                } else { /* Merge the new hash with the existing one */
                        wordhash_sort(temp_hash);
                        add_hash(wordhash, temp_hash);
                        wordhash_free(temp_hash);
                        wordcount += temp_count;
                }
        } while(cont);
        ...

yes? (note: written by eye and not compiled; eat with care; do not machine wash; if
swallowed, seek medical advice).

Michael.




More information about the Bogofilter mailing list