TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Faster hash joiner with vectorized execution

120 pointsby jordanlewisover 6 years ago

9 comments

angelachang27over 6 years ago
Hey everyone, I'm the intern who did this work, happy to answer any questions if you have them!
评论 #19049346 未加载
评论 #19050287 未加载
评论 #19052808 未加载
oblover 6 years ago
The &quot;core&quot; of the trick is nice : amortizing interpreter dispatch over many items. (ignoring the column layout&#x2F;SIMD stuff which basically helps in any case)<p>Essentially it&#x27;s turning :<p><pre><code> LOAD DISPATCH OP1 DISPATCH OP2 ... (once per operation in the expression) STORE ... (once per row) </code></pre> into<p><pre><code> DISPATCH LOAD OP1 STORE LOAD OP1 STORE ... (once per row) DISPATCH ... (once per operation in the expression) </code></pre> The nice trade-off here is that you don&#x27;t require code generation to do that, but it&#x27;s still not optimal.<p>If you can generate code it&#x27;s even better to fuse the operations, to get something like :<p><pre><code> LOAD OP1 OP2 ... STORE LOAD ... </code></pre> It helps because even though you can tune your batch size to get mostly cached loads and stores, it&#x27;s still not free.<p>For example on Haswell you can only issue 1 store per cycle, so if OP is a single add you&#x27;re leaving up to 3&#x2F;4 of your theoretical ALU throughput on the table.
evrydayhustlingover 6 years ago
Pretty impactful work for an intern! One thing I would have liked to see, both from a process and communication standpoint, is leading with some stats on how much faster the vectorized loops are in isolation from the surrounding engine.<p>It&#x27;s always good practice to dig into a deep project like this with some napkin estimates of how much you stand to gain, and how much overhead you can afford to spend setting yourself up for the faster computation. (Not to mention how much of your own time is merited!)
评论 #19049074 未加载
nimishover 6 years ago
Awesome work. Really cool!<p>I have a hobby project to write an analytics DB that uses ISPC for vectorized execution. Currently not much (sums are real easy) but I really wonder if it could reduce the effort to vectorize these sorts of things.
评论 #19051091 未加载
blr246over 6 years ago
Great write-up. Is the long-term vision to go completely to the vectorised query execution model, or are there cases where a row-oriented plan might be better, such as cases when there are complex computations involving multiple columns of a single row?
评论 #19055401 未加载
sAbakumoffover 6 years ago
I looked at the title and thought that the article is about rolling joints with hash by using some advanced robotic technology and software. Damn!
评论 #19053661 未加载
andonisusover 6 years ago
It was a pleasant surprise to see the code base written in Go!
jnordwickover 6 years ago
Column oriented dbs have been doing parallel (column per thread) joins for a while now, no? And I know they have been leaning heavily on vectorization for over a decade.
JanecekPetrover 6 years ago
Genetic specialization (List&lt;int&gt;) is coming. See <a href="https:&#x2F;&#x2F;openjdk.java.net&#x2F;projects&#x2F;valhalla&#x2F;" rel="nofollow">https:&#x2F;&#x2F;openjdk.java.net&#x2F;projects&#x2F;valhalla&#x2F;</a>. Not this year (value types will be first). But they&#x27;re working in it really hard.