TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Building a faster hash table for high performance SQL joins

82 点作者 nhourcard超过 1 年前

5 条评论

senderista超过 1 年前
Since the blog post mentioned a PR to replace linear probing with Robin Hood, I just wanted to mention that I found bidirectional linear probing to outperform Robin Hood across the board in my Java integer set benchmarks:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;senderista&#x2F;hashtable-benchmarks&#x2F;blob&#x2F;master&#x2F;src&#x2F;main&#x2F;java&#x2F;set&#x2F;int64&#x2F;BLPLongHashSet.java">https:&#x2F;&#x2F;github.com&#x2F;senderista&#x2F;hashtable-benchmarks&#x2F;blob&#x2F;mast...</a><p><a href="https:&#x2F;&#x2F;github.com&#x2F;senderista&#x2F;hashtable-benchmarks&#x2F;wiki&#x2F;64-bit-benchmarks">https:&#x2F;&#x2F;github.com&#x2F;senderista&#x2F;hashtable-benchmarks&#x2F;wiki&#x2F;64-b...</a>
评论 #38714074 未加载
评论 #38712470 未加载
评论 #38715386 未加载
评论 #38712487 未加载
评论 #38714141 未加载
gavinray超过 1 年前
I always enjoy reading stuff written by Andrey, he&#x27;s a brilliant fellow for sure.<p>Can highly recommend his personal blog as well: <a href="https:&#x2F;&#x2F;puzpuzpuz.dev&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;puzpuzpuz.dev&#x2F;</a>
评论 #38712512 未加载
_a_a_a_超过 1 年前
From article<p>&quot;Imagine that we run this query over a few hundred million rows. This means at least a few hundred million hash table operations. As you might imagine, a slow hash table would make for a slower query. A faster hash table? Faster queries!&quot;<p>I&#x27;ll read the article properly after this, this is just a quick skim, but I can&#x27;t see this quote can be correct. Unless I&#x27;m missing something, hashing function is fast compared to random bouncing around inside ram – very much faster then random memory accesses. So I can&#x27;t see how it make a difference.<p>Okay, I&#x27;ll read the article now…<p>Edit:<p>&quot;If you insert &quot;John&quot; and then &quot;Jane&quot; string keys into a FastMap, then that would later become the iteration order. While it doesn&#x27;t sound like a big deal for most applications, this guarantee is important in the database world.<p>If the underlying table data or index-based access returns sorted data, then we may want to keep the order to avoid having to sort the result set. This is helpful in case of a query with an ORDER BY clause. Performance-wise, direct iteration over the heap is also beneficial as it means sequential memory access.&quot;<p>but &quot;...if the underlying table data or index-based access returns sorted data...&quot; Then you&#x27;ve got sorted data, in which case use a merge join instead of a hash join surely.
评论 #38712623 未加载
rkerno超过 1 年前
Hi, I&#x27;m curious how you deal with the potential for hash collisions across a large data set - is that a post-join check?
评论 #38718106 未加载
pixelpoet超过 1 年前
Serious question, if performance is the lynchpin, why write it in Java?<p>Especially considering they use unsafe &quot;heavily&quot;, for big joins they could easily just call out to some native code if the surrounding code reaaaaally must be Java (again, why?). It&#x27;s the worst of both worlds using unsafe Java: you don&#x27;t get native speed, there&#x27;s loads of memory overhead from everything being an Object (besides the rest of the VM stuff), and get to &quot;enjoy&quot; GC pauses in the middle of your hot loops, and with fewer safety guarantees than something like Rust.
评论 #38711787 未加载
评论 #38712586 未加载
评论 #38711764 未加载
评论 #38712446 未加载