hash table to replace Judy

Thu Sep 19 03:59:03 CEST 2002

On Thu, Sep 19, 2002 at 12:56:56AM +0200, Matthias Andree wrote:
> On Wed, 18 Sep 2002, Gyepi SAM wrote:
> 
> > > I'm not sure if I recall this correctly, but did perchance DB offer a
> > > "no-file" hash mode? If so, that might be far easier, because we depend
> > > on DB anyways.
> > 
> > It does, and would be easier, but also slower. Incrementing a word requires that
> > 
> > 1. Retrieve the current value, if any
> > 2. Set or change the value
> > 3. Write it back to the database (which also involves another search)
> 
> I'm still a BerkeleyDB ignorant, but I read about "cursors" in the docs
> -- which are used at least for transactional completion. Would not this
> save the second search? Or would the other search still be required
> because some other location may have changed?

cursors are for sequential access, which is too slow for what we're doing.
Note that cursors are not required for transactions. Also, the second search is implicit
in that when you call DBT->put(), the system has to search for the correct location to place
the new item. It can only be avoided by having the first search operation insert a new item if
necessary and, in either case, returning a pointer to the associated data structure.

We have to keep in mind that we actually have different needs at different times.
When registering words, we need a structure that can perform fast lookups, inserts, updates,
and supports iteration so we can write the data to disk when complete.

When calculating spamicity, we then have the additional need for a system that can perform fast
disk based lookups.

It may be possible to use the same system for both, but I don't think it would necessarily be a good
solution.

-Gyepi