TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Beating hash tables with trees? The ART-ful radix trie

152 点作者 HenryR超过 6 年前

8 条评论

jamii超过 6 年前
My experience has been that the vast majority of papers on data-structures are at best misleading, and at worst deliberately biased.<p>For example:<p>&gt; The hash table used by the authors of ART in their study was a chained hash table, but this kind of hash tables can be suboptimal in terms of space and performance due to their potentially high use of pointers.<p>&gt; Our experiments strongly indicate that neither ART nor Judy are competitive to the aforementioned hashing schemes in terms of performance, and, in the case of ART, sometimes not even in terms of space.<p><a href="https:&#x2F;&#x2F;www.victoralvarez.net&#x2F;papers&#x2F;A%20Comparison%20of%20Adaptive%20Radix%20Trees%20and%20Hash%20Tables%20-%20ICDE%202015.pdf" rel="nofollow">https:&#x2F;&#x2F;www.victoralvarez.net&#x2F;papers&#x2F;A%20Comparison%20of%20A...</a>
评论 #18422542 未加载
评论 #18422440 未加载
评论 #18422142 未加载
acidx超过 6 年前
Nice article and analysis! I&#x27;m actually considering scrapping the trie used in my project to something based off of this one, with some modifications:<p>For instance, find_node(c, Node48) could avoid the branch if a non-existing index points to an additional entry in child_ptrs that&#x27;s always NULL. Lookup would be comparable to the Node256 version.<p>Another thing that could be done, is to scrap the Node48 entirely, and implement two new structs to replace it: Node32 and Node64, and use respectively AVX2 and AVX512. These can be based off of the Node16 version. It remains to be seen if these will yield better performance than the branchless Node48 above, especially if power management kicks in when mixing AVX512 with older SIMD generations.<p>The trie in Lwan (<a href="https:&#x2F;&#x2F;lwan.ws" rel="nofollow">https:&#x2F;&#x2F;lwan.ws</a>) does an interesting trick to reduce the amount of memory used in the equivalent of a Node256: instead of 256 pointers to a node, it has only 8 pointers. Characters are hashed (MOD 8). The leaf node contains a linked list of key&#x2F;value pair, and an actual string comparison is performed at the end. (Lwan cheats here by avoiding a string comparison if the linked list contains only 1 element.) Works pretty well, as it&#x27;s part of the URL routing mechanism.<p>One other experiment I&#x27;ve been making with tries, is to use the idea of key compression and use it in a different way: slice it every 4 or 8 bytes, consider those bytes as an arbitrary integer, and add every chunk of it to a hashmap&lt;int, some_struct&gt;, building a chain for the next lookup in some_struct. The prototype I wrote works pretty well.
评论 #18424067 未加载
评论 #18421637 未加载
faragon超过 6 年前
A good point for both RB trees and linear-addressing hash tables is that they can be implemented with vectors( [1], [2]), allowing the case of initial reservation for N elements, so with a tricky implementation you could even have the data structure with one or zero allocations (e. g. allocate the tree or the hash table in the stack). For tries you could use many memory pools for the different node sizes, apply path compression, and even a LUT accelerator for reaching the Nth level, but hardly could be implemented using a vector.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;faragon&#x2F;libsrt&#x2F;blob&#x2F;master&#x2F;src&#x2F;saux&#x2F;stree.c" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;faragon&#x2F;libsrt&#x2F;blob&#x2F;master&#x2F;src&#x2F;saux&#x2F;stree...</a><p>[2] I&#x27;m implementing a key-value hash table that will be added to the same library as [1] with &quot;srt_hmap&quot; type, in one continuous allocation. Being able to use hash tables allocated both in the heap and in the stack (e.g. you could use a int32-int32 hash table allocated in the stack for computing the color frequency of a bitmap image). Being the HT performance 4 to 5x the performance of the RB-trees, including cost of rehashing - rehash implementation using techniques for avoiding moving all the data- (rehashing only available for the heap case).
评论 #18420400 未加载
namibj超过 6 年前
Just a friendly reminder that B-trees are often faster on modern microprocessors than RB-trees. See kbtree.h for a simple, yet fast example. I didn&#x27;t test it, but I&#x27;d assume B-tries would be rather efficient.
评论 #18421188 未加载
评论 #18422154 未加载
marknadal超过 6 年前
Radix trees are one of the most under utilized data structures.<p>They are great and have fantastic performance!<p>I implemented a custom on disk storage engine with a Radix format, and am getting on a low end MacBook Air 2015 about 3K&#x2F;acked writes to disk&#x2F;second! It is now the default at <a href="https:&#x2F;&#x2F;github.com&#x2F;amark&#x2F;gun" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;amark&#x2F;gun</a> the code is pretty short too.
repsilat超过 6 年前
&gt; <i>This is superior to binary-search: no branches (except for the test when bitfield is 0), and all the comparisons are done in parallel</i><p>Branchless binary search isn&#x27;t hard to implement if you know (or can bound) the number of elements statically. You just use the comparison result arithmetically instead of branching on it, and you unroll the loop.<p>Obviously a binary search can&#x27;t do comparisons in parallel, though.
评论 #18423182 未加载
laxk超过 6 年前
If somebody need a good&#x2F;fast&#x2F;well optimized Go implementation of the ART, check this out: <a href="https:&#x2F;&#x2F;github.com&#x2F;plar&#x2F;go-adaptive-radix-tree" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;plar&#x2F;go-adaptive-radix-tree</a> (disclosure: I&#x27;m the author)
saagarjha超过 6 年前
I had to implement a trie for an Aho-Corasick implementation a while back, and I just used a std::unordered_set&lt;unichar, std::unique_ptr&lt;trie_node&gt;&gt; to store the children (this was Objective-C++, so I was using UTF-16 characters taken from an NSString). Worked well enough for the effort I put into it.
评论 #18420369 未加载