wordhashes [was: time test]

Gyepi SAM gyepi at praxis-sw.com
Mon Nov 25 15:48:17 CET 2002


On Mon, Nov 25, 2002 at 08:31:09AM -0500, David Relson wrote:
> I thought bogofilter built a wordhash for each message (using 
> collect_words()) and release the wordhash (freed the memory) at the end of 
> the message.

The wordhash has to persist across messages since it contains all the
words from every message since collect_words() slurps up the entire
input before returning.

> Matthias' figures indicate that the release/free is 
> missing.  If this is so, we have a defect in the code.  I don't think it's 
> a big deal, because bogofilter is typically called for a single 
> message.  However we ought to check the code and fix the problem.

I agree that this is a problem, but not a big deal.
Nonetheless, we should fix it.

In addition to possible solution I outlined yesterday, here's another
obvious one.

In collect_words(), create a master wordhash, then for each message,
create a message specific wordhash, get its words, then transfer the words into
the master wordhash and free the message specific wordhash. At the end,
just return the master wordhash.

They'll both work, yesterday's solution is more complicated but also faster since we don't copy every wordhash node into another wordhash.

Unless someone offers a better one, I am inclined to implement the first solution.

-Gyepi




More information about the Bogofilter mailing list