An awesome read!<p>Something related that I found out about from HN a few months back is another engine called quokka. It's particularly interesting and applicable how quokka schedules distributed queries to outperform Spark <a href="https://github.com/marsupialtail/quokka/blob/master/blog/why.md">https://github.com/marsupialtail/quokka/blob/master/blog/why...</a>
> KQuery does not yet implement the join operator.<p>Whilst I applaud this book writing initiative, completing it could easily become a lifetime's work! It will be a fascinating journey to follow along with in any case.<p>Apache Arrow could (and hopefully will) really shake up the database industry in the years ahead. Whatever eventually supplants Postgres is quite likely going to be based on Arrow - polyglot zero-copy vector processing is the future.<p>Aside: for anyone looking for a more theoretical overview of databases and query languages, this ~free Foundations of Databases book still holds up well <a href="http://webdam.inria.fr/Alice/" rel="nofollow noreferrer">http://webdam.inria.fr/Alice/</a>
If you prefer more academic pov, there is a WIP book called "Building Query Compilers"<p><a href="https://pi3.informatik.uni-mannheim.de/~moer/querycompiler.pdf" rel="nofollow noreferrer">https://pi3.informatik.uni-mannheim.de/~moer/querycompiler.p...</a>
I'm amused, that sometimes when you begin starting to learn something, you see resources for learning everywhere. (And I just learned it's called Fequency Illusion or baader-meinhof-phenomenon [0])<p>Thank you so much for publishing this book for free, I'm eager to dive in! Is there any chance you can also link it as ePub?<p>[0]: <a href="https://en.wikipedia.org/wiki/Frequency_illusion" rel="nofollow noreferrer">https://en.wikipedia.org/wiki/Frequency_illusion</a>
At the beginning of the book it says 'The real power of Spark is that this query can be run on a laptop or on a cluster of hundreds of servers with no code changes required.'<p>Is this accurate? I thought the power of Spark is that data stays in memory in each node in the cluster. Being able to run on a cluster in a fault tolerant manner was achieved already by Hadoop ecosystem.
<p><pre><code> val spark: SparkSession = SparkSession.builder
.appName("Example")
.master("local[*]")
.getOrCreate()
val df = spark.read.parquet("/mnt/nyctaxi/parquet")
.groupBy("passenger_count")
.sum("fare_amount")
.orderBy("passenger_count")
df.show()
</code></pre>
That's the first listing in this book. This is a disaster... Query language implemented in a pug-ugly other language which has to rely on two other query languages to function. This is the worst sales pitch in the history of sale pitches. Maybe ever.
I bought this book back in March 2020 and it was a great read then. Since I've received numerous updates from Leanpub on the additions Andy has written!
Andy Grove is a great guy, and the book is excellent. This might be interesting as well: <a href="https://www.youtube.com/watch?v=NEL6DluUxgw&ab_channel=DeltaLake">https://www.youtube.com/watch?v=NEL6DluUxgw&ab_channel=Delta...</a>
> A logical plan represents a relation (a set of tuples) with a known schema. Each logical plan can have zero or more logical plans as inputs. It is convenient for a logical plan to expose its child plans so that a visitor pattern can be used to walk through the plan.<p>This is not a definition. Definition needs at least a modicum of a convincing to do. Like, why did the author decide that a plan represents a relation? Most plans I've seen in my life were grocery shopping list-like, or maybe decision tree-like. I can see no way to get from "plan" to "set of tuples".<p>Also, what visitor pattern has to do with anything? Why is this bit of information even relevant?
> The first step in building a query engine is to choose a type system to represent the different types of data that the query engine will be processing.<p>This. Out of nowhere. Why would this be the first step? There's no explanation... Why cannot be this the tenth or the hundredth step?
Really nice! Anyone knows if there is any list of such website books / gitbooks? I try to bookmark these whenever I come across them, but I believe there must be so many great ones I don't know of.