oddity.
michael at optusnet.com.au
michael at optusnet.com.au
Mon Apr 14 23:59:21 CEST 2003
David Relson <relson at osagesoftware.com> writes:
> At 01:12 AM 4/14/03, michael at optusnet.com.au wrote:
>
> >In bogofilter.c:
> > /* tokenize input text and save words in a wordhash. */
> > do {
> > collect_words(&wordhash, &wordcount, &cont);
> > ++msgcount;
> > } while(cont);
> >
> >Shouldn't that be free'ing wordhash somewhere before it overwrites it?
>
> bogofilter() is the message classification function.
[...]
> When classifying messages, the message body can contain lines starting
> with "^From ". In a mailbox those message body lines are escaped
> (typically as ">From "). Without the loop, such a line would
> terminate parsing the message.
>
> Summary: it's correct.
I understand why the loop is there, but collect_words() ignores its
'wordhash' input, and allocates a new hash every time it's called.
It's returned by overwriting the 'wordhash' input.
So the loop above actually ignores everything except the
words after the last 'From ' and leaks memory for all the
blocks before that From.
It sounds like you really want something like:
wordhash_t * wordhash = NULL;
long wordcount = 0;
...
collect_reset();
do {
wordhash_t * temp_hash;
long temp_count;
collect_words(&temp_hash, &temp_count, &cont);
if (!wordhash) {
wordhash = temp_hash;
wordcount = temp_count;
} else { /* Merge the new hash with the existing one */
wordhash_sort(temp_hash);
add_hash(wordhash, temp_hash);
wordhash_free(temp_hash);
wordcount += temp_count;
}
} while(cont);
...
yes? (note: written by eye and not compiled; eat with care; do not machine wash; if
swallowed, seek medical advice).
Michael.
More information about the Bogofilter
mailing list