TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Lab Notes: How We Made Joins 23 Thousand Times Faster, Part Two

3 pointsby nslateralmost 7 years ago

1 comment

mgsouthalmost 7 years ago
tldr: CrateDB implemented hashed joins, where both tables are too big to fit into memory, by reading left table in chunks and scanning entire right table for each chunk.<p>Part one at <a href="https:&#x2F;&#x2F;crate.io&#x2F;a&#x2F;lab-notes-how-we-made-joins-23-thousand-times-faster-part-one&#x2F;" rel="nofollow">https:&#x2F;&#x2F;crate.io&#x2F;a&#x2F;lab-notes-how-we-made-joins-23-thousand-t...</a><p>The hashing is used for equi-joins, where two tables are related with (possibly multiple) equality operators; e.g. &quot;select * from t1 join t2 on t1.a = t2.b and t1.x = t2.y&quot;.<p>The benchmarks show a very large improvement over the previous algorithm, but it&#x27;s still O(M&#x2F;c*N&#x2F;d). It would be interesting to see CrateDB only keep the hashes in memory, possibly using Bloom filters or such, then re-read the tables, ignoring any rows that don&#x27;t match a hash. If the _selected_ rows from the second table will fit into memory, then you can get O(2M+N).