TechEcho

Note that the maximum fill factor of hashtables can be increased well above 90% with comparable or better performance to standard open addressing using Robin Hood collision resolution with backwards shift deletion. The hashtable in the Rust standard library works like this and there is an argument to be made that it should become standard.<p>Of course, this doesn't do away with the need to resize, so it might not be appropriate for low latency or memory constrained systems. I wonder whether a tree-based data structure like a crit-bit tree or a HAMT might be able to do similar things to sparsehash with better performance.

tldr: Open addressing. Group adjacent slots into buckets. Every bucket consists of an occupancy bitmap (1 bit overhead per element), and a pointer (ptrsize / nElementsPerBucket bits overhead, per element) to the elements in the bucket.

So we want to insert a value. We find an empty slot. We find the bucket it's in. Do we have to realloc the bucket and move all downstream data down by one? That sounds slow.

Any idea how much space overhead malloc has with entries as small as this? (<1 to 4> * 8 bytes)<p>Seems to me it could easily be a significant amount of overhead.

in case folks are interested in this topic, i would heartily recommend Mihai Pătraşcu blog for some theory :)

So we want to insert a value. We find an empty slot. We find the bucket it's in. Do we have to realloc the bucket and move all downstream data down by one? That sounds slow.

Any idea how much space overhead malloc has with entries as small as this? (<1 to 4> * 8 bytes)<p>Seems to me it could easily be a significant amount of overhead.

in case folks are interested in this topic, i would heartily recommend Mihai Pătraşcu blog for some theory :)

How Google Sparsehash achieves two bits of overhead per entry using sparsetable

5 comments

How Google Sparsehash achieves two bits of overhead per entry using sparsetable

5 comments