TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

How Query Engines Work

315 pointsby asicspover 1 year ago

12 comments

willvarfarover 1 year ago
An awesome read!<p>Something related that I found out about from HN a few months back is another engine called quokka. It&#x27;s particularly interesting and applicable how quokka schedules distributed queries to outperform Spark <a href="https:&#x2F;&#x2F;github.com&#x2F;marsupialtail&#x2F;quokka&#x2F;blob&#x2F;master&#x2F;blog&#x2F;why.md">https:&#x2F;&#x2F;github.com&#x2F;marsupialtail&#x2F;quokka&#x2F;blob&#x2F;master&#x2F;blog&#x2F;why...</a>
评论 #37485910 未加载
refsetover 1 year ago
&gt; KQuery does not yet implement the join operator.<p>Whilst I applaud this book writing initiative, completing it could easily become a lifetime&#x27;s work! It will be a fascinating journey to follow along with in any case.<p>Apache Arrow could (and hopefully will) really shake up the database industry in the years ahead. Whatever eventually supplants Postgres is quite likely going to be based on Arrow - polyglot zero-copy vector processing is the future.<p>Aside: for anyone looking for a more theoretical overview of databases and query languages, this ~free Foundations of Databases book still holds up well <a href="http:&#x2F;&#x2F;webdam.inria.fr&#x2F;Alice&#x2F;" rel="nofollow noreferrer">http:&#x2F;&#x2F;webdam.inria.fr&#x2F;Alice&#x2F;</a>
评论 #37432189 未加载
评论 #37431787 未加载
评论 #37430797 未加载
评论 #37430747 未加载
评论 #37431278 未加载
flakinessover 1 year ago
If you prefer more academic pov, there is a WIP book called &quot;Building Query Compilers&quot;<p><a href="https:&#x2F;&#x2F;pi3.informatik.uni-mannheim.de&#x2F;~moer&#x2F;querycompiler.pdf" rel="nofollow noreferrer">https:&#x2F;&#x2F;pi3.informatik.uni-mannheim.de&#x2F;~moer&#x2F;querycompiler.p...</a>
评论 #37455530 未加载
luguenthover 1 year ago
I&#x27;m amused, that sometimes when you begin starting to learn something, you see resources for learning everywhere. (And I just learned it&#x27;s called Fequency Illusion or baader-meinhof-phenomenon [0])<p>Thank you so much for publishing this book for free, I&#x27;m eager to dive in! Is there any chance you can also link it as ePub?<p>[0]: <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Frequency_illusion" rel="nofollow noreferrer">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Frequency_illusion</a>
评论 #37431248 未加载
评论 #37435101 未加载
iamcreasyover 1 year ago
At the beginning of the book it says &#x27;The real power of Spark is that this query can be run on a laptop or on a cluster of hundreds of servers with no code changes required.&#x27;<p>Is this accurate? I thought the power of Spark is that data stays in memory in each node in the cluster. Being able to run on a cluster in a fault tolerant manner was achieved already by Hadoop ecosystem.
评论 #37431863 未加载
评论 #37430840 未加载
评论 #37431362 未加载
评论 #37435895 未加载
评论 #37434042 未加载
crabboneover 1 year ago
<p><pre><code> val spark: SparkSession = SparkSession.builder .appName(&quot;Example&quot;) .master(&quot;local[*]&quot;) .getOrCreate() val df = spark.read.parquet(&quot;&#x2F;mnt&#x2F;nyctaxi&#x2F;parquet&quot;) .groupBy(&quot;passenger_count&quot;) .sum(&quot;fare_amount&quot;) .orderBy(&quot;passenger_count&quot;) df.show() </code></pre> That&#x27;s the first listing in this book. This is a disaster... Query language implemented in a pug-ugly other language which has to rely on two other query languages to function. This is the worst sales pitch in the history of sale pitches. Maybe ever.
评论 #37437854 未加载
评论 #37438689 未加载
评论 #37435866 未加载
poormanover 1 year ago
I bought this book back in March 2020 and it was a great read then. Since I&#x27;ve received numerous updates from Leanpub on the additions Andy has written!
robertkossover 1 year ago
Andy Grove is a great guy, and the book is excellent. This might be interesting as well: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=NEL6DluUxgw&amp;ab_channel=DeltaLake">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=NEL6DluUxgw&amp;ab_channel=Delta...</a>
crabboneover 1 year ago
&gt; A logical plan represents a relation (a set of tuples) with a known schema. Each logical plan can have zero or more logical plans as inputs. It is convenient for a logical plan to expose its child plans so that a visitor pattern can be used to walk through the plan.<p>This is not a definition. Definition needs at least a modicum of a convincing to do. Like, why did the author decide that a plan represents a relation? Most plans I&#x27;ve seen in my life were grocery shopping list-like, or maybe decision tree-like. I can see no way to get from &quot;plan&quot; to &quot;set of tuples&quot;.<p>Also, what visitor pattern has to do with anything? Why is this bit of information even relevant?
评论 #37438003 未加载
crabboneover 1 year ago
&gt; The first step in building a query engine is to choose a type system to represent the different types of data that the query engine will be processing.<p>This. Out of nowhere. Why would this be the first step? There&#x27;s no explanation... Why cannot be this the tenth or the hundredth step?
评论 #37435960 未加载
cppluajsover 1 year ago
Really nice! Anyone knows if there is any list of such website books &#x2F; gitbooks? I try to bookmark these whenever I come across them, but I believe there must be so many great ones I don&#x27;t know of.
评论 #37431263 未加载
slt2021over 1 year ago
Code is very elegant, love kotlin in contrast to java