TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Integer Map Data Structure

91 点作者 billziss超过 1 年前
This project presents a new data structure for storing ordered integer maps. The proposed data structure is a compressive, cache-friendly, radix tree that has performance comparable to an unordered map (`std::unordered_map`) and is an order of magnitude faster than an ordered map (`std::map`).

11 条评论

bts超过 1 年前
FWIW there is prior art here. e.g. see IntMap in Haskell: <a href="https:&#x2F;&#x2F;hackage.haskell.org&#x2F;package&#x2F;containers-0.7&#x2F;docs&#x2F;Data-IntMap-Strict.html" rel="nofollow">https:&#x2F;&#x2F;hackage.haskell.org&#x2F;package&#x2F;containers-0.7&#x2F;docs&#x2F;Data...</a>
评论 #39127312 未加载
评论 #39126161 未加载
jemfinch超过 1 年前
This really doesn&#x27;t seem to be comparing to comparable data structures. For int map specializations like this, the optimized alternatives are things like Judy (which is looking quite aged these days) or roaring bitmaps, not to mention that any C++ developer using &quot;ordinary&quot; maps will be using absl&#x27;s SwissTable (flat_hash_map) or folly&#x27;s F14 (F14FastMap) or perhaps absl::btree_map if order is important. Comparisons to std::map and std::unordered_map are simply too naive to make the case for this data structure.
评论 #39127289 未加载
lichtenberger超过 1 年前
We&#x27;re using a similar trie structure as the main document (node) index in SirixDB[1]. Lately, I got some inspiration for different page-sizes based on the ART and HAMT basically for the rightmost inner pages (as the node-IDs are generated by a simple sequence generator and thus also all inner pages (we call them IndirectPage) except for the rightmost are fully occupied (the tree height is adapted dynamically depending on the size of the stored data. Currently, always 1024 references are stored to indirect child pages, but I&#x27;ll experiment with smaller sized, as the inner nodes are simply copied for each new revision, whereas the leaf pages storing the actual data are versioned themselfes with a novel sliding snapshot algorithm.<p>You can simply compute from a unique nodeId each data is assigned (64bit) the page and reference to traverse on each level in the trie through some bit shifting.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;sirixdb&#x2F;sirix">https:&#x2F;&#x2F;github.com&#x2F;sirixdb&#x2F;sirix</a>
NWoodsman超过 1 年前
Also will throw in to the mix, in C#:<p><a href="https:&#x2F;&#x2F;julesjacobs.com&#x2F;2014&#x2F;11&#x2F;11&#x2F;immutable-vectors-csharp.html" rel="nofollow">https:&#x2F;&#x2F;julesjacobs.com&#x2F;2014&#x2F;11&#x2F;11&#x2F;immutable-vectors-csharp....</a><p>His implementation uses buffers of capacity 32, generics, and bit shifting to do lookups.
winrid超过 1 年前
Neat, thank you! I&#x27;d love to see how it compares to the libgdx IntMap[0].<p>[0] <a href="https:&#x2F;&#x2F;github.com&#x2F;libgdx&#x2F;libgdx&#x2F;blob&#x2F;master&#x2F;gdx&#x2F;src&#x2F;com&#x2F;badlogic&#x2F;gdx&#x2F;utils&#x2F;IntMap.java">https:&#x2F;&#x2F;github.com&#x2F;libgdx&#x2F;libgdx&#x2F;blob&#x2F;master&#x2F;gdx&#x2F;src&#x2F;com&#x2F;bad...</a>
评论 #39125543 未加载
AaronFriel超过 1 年前
Interesting! Reminds me a great deal of Judy Arrays: <a href="https:&#x2F;&#x2F;en.m.wikipedia.org&#x2F;wiki&#x2F;Judy_array" rel="nofollow">https:&#x2F;&#x2F;en.m.wikipedia.org&#x2F;wiki&#x2F;Judy_array</a><p>Judy Arrays are a radix trie with branching and a few node types designed to be cache line width optimized.
repsilat超过 1 年前
Looks rad, I was going to look into some b-trees for a use-case where I need an ordered map of things similar to integers and this might be better.<p>I couldn&#x27;t immediately see, is there mention of whether insertions invalidate iterators? Maybe not strictly needed for my use-case but good to know.
ww520超过 1 年前
This looks very good. The idea of using a subnet-mask style to compute the prefix of a node is pretty novel. I haven&#x27;t seen anything like it. The choice of span factor of 16 is a good compromise between node size and tree depth. The node slot packing is amazing. Actually if you relax the restriction on 64-byte node to 128-byte node, you can get 64 bits per slot and will get a much higher limit for the item count. Newer CPU&#x27;s are starting to support 128-byte cache line.
评论 #39127529 未加载
ursusmaritimus超过 1 年前
Interesting, but the summary does not mention an important fact: the data structure can contain at most 67108864 items, which is a quite low limit.
评论 #39127180 未加载
评论 #39130499 未加载
JonChesterfield超过 1 年前
Appears to be a 16 way branching trie which completely misses both advantages of tree structures over hashes:<p>1&#x2F; This tree is mutable, insert doesn&#x27;t give you a new tree via path copying<p>2&#x2F; union&#x2F;intersection style operations can be sublinear. None of the batch operations are implemented
notfed超过 1 年前
It&#x27;d be nice to include djb&#x27;s crit-bit tree implementation (linked to from the intro) in the benchmarks (&quot;Performance Test&quot; section).