SQL is syntactic sugar for relational algebra

216 pointsby dmartoabout 1 year ago

22 comments

bkanukaabout 1 year ago

As someone who learned mathematics first and programming later, I think it took me about 10 years of working in data-intensive programming before I could write really "good" SQL from scratch.I completely attribute this to SQL being difficult or "backwards" to parse. I mean backwards in the way that in SQL you start with what you want first (the SELECT) rather than what you have and widdling it down. Also in SQL (as the author states) you often need to read and understand the structure of the database before you can be 100% sure what the query is doing. SQL is very difficult to parse into a consistent symbolic language.The turning point for me was to just accept SQL for what it is. It feels overly flexible in some areas (and then comparatively ridgid in other areas), but instead of fighting against this or trying to understand it as a consistent, precise language , I instead just go "oh SQL - you are not like the other programming languages I use but you can do some pretty neat stuff so we can be on good terms".Writing good SQL involves understanding the database, understanding exactly the end result you want, and only then constructing the subqueries or building blocks you need to get to your result. (then followed by some trial and error of course)

评论 #39800361 未加载

评论 #39799987 未加载

评论 #39799863 未加载

评论 #39799750 未加载

评论 #39799851 未加载

评论 #39805786 未加载

评论 #39800031 未加载

评论 #39800898 未加载

roenxiabout 1 year ago

I'm glad that the article concluded "No" to it's own headline. Calling SQL "syntactic sugar" is an insult to sugar. The "helpful diagram explaining how the scoping rules work" alone should make people blanch. The language is a syntactic disaster that we've been saddled with out of habit and inertia.

评论 #39797904 未加载

评论 #39799091 未加载

评论 #39797903 未加载

评论 #39799603 未加载

评论 #39800822 未加载

评论 #39799820 未加载

评论 #39800182 未加载

qazxcvbnmabout 1 year ago

As someone who has implemented a composable SQL generator from user-defined algebras of (arbitrary SQL) queries using relational algebra, I understand the shortcomings of SQL when viewed from an angle of a relational query language.However, SQL is a language with many facets (DML, DDL, DCL) other than 'relational' querying. Putting on a less mathematical and more engineering mindset, SQL ingratiates me by its interface to incredibly powerful primitives difficult to find anywhere else. (I've primarily worked with Postgres SQL)Consider the humble function; in SQL <a href="https://www.postgresql.org/docs/current/sql-createfunction.html" rel="nofollow">https://www.postgresql.org/docs/current/sql-createfunction.h...</a>, one can declare the function as `stable` or `immutable` to let the runtime optimise repeated calls; as `parallel` to let the runtime consider parallelisation, as `cost ...` and `rows ...` to aid optimiser cost estimation. Imagine if one could do that in C or Javascript!Another facet which regularly puts me in awe is the transaction isolation primitives and locking primitives offered by SQL.I understand that as a database language, SQL necessarily has these within its specialised niche, but it seems to me these aspects of SQL as an interface to a language runtime would be equally useful in the everyday program; in all these areas of functionality, SQL is so much more advanced than nearly every other general purpose programming language.

triskaabout 1 year ago

Codd's seminal paper, A Relational Model of Data for Large Shared Data Banks, states that a language based on applied predicate calculus "would provide a yard-stick of linguistic power for all other proposed data languages". Quoting from <a href="https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf" rel="nofollow">https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf</a>:"1.5 Some linguistic aspectsThe adoption of a relational model of data, as described above, permits the development of a universal data sub-language based on an applied predicate calculus. A first-order predicate calculus suffices if the collection of relations is in normal form. Such a language would provide a yard-stick of linguistic power for all other proposed data languages, and would itself be a strong candidate for embedding (with appropriate syntactic modification) in a variety of host languages (programming, command- or problem-oriented)."Languages based on predicate calculus indeed seem extremely suitable for reasoning about relational data. Datalog is a well-known example. It is more directly based on predicate logic, and much simpler than SQL.

refsetabout 1 year ago

> Lest you think is just one weird corner of the sql spec, I found this helpful diagram explaining how the scoping rules work (from Neumann and Leis, 2023)It's an excellent diagram, it really conveys the dissonance. Incidentally I interviewed Viktor Leis on a podcast last week about the paper where it's from: <a href="https://juxt.pro/blog/sane-query-languages-podcast/" rel="nofollow">https://juxt.pro/blog/sane-query-languages-podcast/</a>A lot of people seem to believe that LLMs or other ML methods can overcome the complexity challenges of generating SQL accurately, but I'm yet to be convinced that a database-powered AI revolution can happen without somehow bypassing SQL.

KingOfCodersabout 1 year ago

Everything is just syntactic sugar for something else. I'm syntactic sugar for the hydrogen atoms in my body.

评论 #39801525 未加载

评论 #39798044 未加载

samsquireabout 1 year ago

Thanks for this interesting post.Intuitively, relational algebra compresses enumeration over data in time that a CPU executing billions of cycles a second can feasibly and efficiently traverse and execute against many collections of millions or billions of records in human perceivable time thanks to indexes.I've been trying to think of systems communicating with eachother as parts of a relational model in the sense we can model system behaviour as a series of events and a join is a communication between components.I would love to talk about this with people.

评论 #39799617 未加载

评论 #39800659 未加载

评论 #39799472 未加载

exabrialabout 1 year ago

I’d much rather deal with the peculiarities of SQL than any of the attempted replacements (ones I’ve seen in my minted experience). Elastic for instance, other json based languages, are absolutely terrible. We lost something we when stopped writing ANSI standards.We’ve even stayed on InfluxDB og versions _because of _ the SQL like syntax, and also their improved languages are a nuclear disaster area.SQL, despite its flaws (null != null) is pretty good enough!

jameshartabout 1 year ago

Not totally convinced by the ORDER BY obstacles that the author raises..<pre><code> table('test').project('a').orderBy('b') </code></pre> > That's an error, because we can't order by a column that we just projected away. Right?assumes that 'projection' completely eliminates part of the underlying relation, but why does that have to be the case?If a relation includes 'selected fields' and 'hidden fields', and project just 'hides' the fields it doesn't project, while orderBy can operate on either projected or hidden fields, this ends up being perfectly sound.Even the more complex example which is translated as follows:<pre><code> translate('select a+1 as c from test order by b,c') => table('test').project('a','b').addColumn('a+1', as='c').orderBy('b','c').project('a') </code></pre> would work fine as:<pre><code> table('test') // selected: [a, b, ...], hidden: [] .addColumn('a+1', as='c') // selected: [a, b, c, ...], hidden: [] .project('c') // selected: [c], hidden: [a, b, ...] .orderBy('b','c') // selected: [c], hidden: [a, b, ...] </code></pre> (not sure why there's a .project('a') on the end of their version)Which is a reasonably local, algebraic transformation.

halayliabout 1 year ago

Relational algebra IR is implemented in MonetDB and discussed in their paper. Definitely worth reading.Not trying to be picky but pure relational algebra doesn't map to SQL and IMO it's not a good idea to attempt to do that due to the fact that relational algebra treats tuples as mathematical sets (ordering/uniqueness matters) while SQL does not(and has to deal with nullability).

jokingabout 1 year ago

A few tweaks here and there and it would be nice enough for me. Most of them are actually implemented by some engines but are not part of the standard. Just changing the order of the from and select clauses so autocomplete can know what fields can you use would be a nice enough change.

zvmazabout 1 year ago

I tried to study C. J. Date's books to understand relational theory... suffice it to say that I got nothing from his books, except a deep irritation partly due to his absolute pedantry...I finally learned SQL with a gentle introduction by Alan Beaulieu. I stumbled upon another book that's about the theory: Applied Mathematics for Database Professionals, by Lex deHaan, and Toon Koppelaars. Maybe these authors will benevolently teach me relational theory.But please avoid C. J. Date's books. And don't be him when writing a book or trying to explain something to another human being.

infogulchabout 1 year ago

SQL is pretty good all things considered.But I've always looked out for languages that can represent relational algebra concepts more directly. Maybe CozoaDB is close, though still immature. Any recommendations?

评论 #39801129 未加载

评论 #39799698 未加载

scythmic_wavesabout 1 year ago

This is a great write up. There appear to be a few camps forming in the comments and I’m in camp “SQL is confusing and attempts to explain it in terms of relational algebra have felt inadequate to me”.It also gives me some good follow up material to read. I’m particularly interested in that one link that forms subqueries and lateral joins in terms of a new “dependent join” operator.

评论 #39800752 未加载

barfbagginusabout 1 year ago

Can we call it syntactic ashtray? Because it feels like I'm sucking on 1970s ashtray when I see or use it.Those who have read their Spivak 2017 will know that databases are just Co-presheaves of Ologs over the Kliesli Category of the Power-Set Monad, the Identity Monad, or the Giry Monad. I would like a QL that acts like it!

评论 #39800864 未加载

samatmanabout 1 year ago

If you're interested in what it would take to put relational databases back on the rigorous footing of relational algebras, the Third Manifesto is a good place to start.<a href="https://www.dcs.warwick.ac.uk/~hugh/TTM/DTATRM.pdf" rel="nofollow">https://www.dcs.warwick.ac.uk/~hugh/TTM/DTATRM.pdf</a>I find it somewhat sad that an implementation of a database with a proper D language hasn't broken out and become a ubiquitous tool for the profession. There were some proprietary versions shortly after the manifesto's publication, but it never caught on.

aoeusnth1about 1 year ago

I find that most people who object to SQL do not use TVFs. If you don’t have any tools to easily break down the steps of the work, of course SQL will feel like an opaque Write-only language. With TVFs you can easily iteratively add more complex steps to your query while checking your work while you build.

评论 #39804409 未加载

keidabout 1 year ago

See C.J. Date's "An Introduction to Database Systems," <a href="https://www.amazon.com/Introduction-Database-Systems-8th/dp/0321197844" rel="nofollow">https://www.amazon.com/Introduction-Database-Systems-8th/dp/...</a> This is not news.

xbarabout 1 year ago

Discussions of what is/is not syntactic sugar are unapproachable for me because I cannot get past the abuse of sugar's essential functions in the tortured metaphor.

achr2about 1 year ago

You should look at LINQ in C#/.net . The SQL-like syntax always has a function-first equivalent, that gets across this point fairly eloquently.

r00fusabout 1 year ago

That diagram separating the syntactic vs. semantic layers of a SQL statement (from Neumann & Leis paper) is brilliant.

chubotabout 1 year ago

An analogy I like is - Are Perl-style regexes (used in Python, Ruby, Java, .NET, etc.) syntactic sugar for regular languages?The answer is no, because Perl added all sorts of imperative doodads to regexes, which can’t be easily represented and executed in the automata-based paradigm. Trying to do this is like a “research paper generator” (and not in a bad way), e.g.Derivative Based Nonbacktracking Real-World Regex Matching with Backtracking Semantics - <a href="https://dl.acm.org/doi/abs/10.1145/3591262" rel="nofollow">https://dl.acm.org/doi/abs/10.1145/3591262</a> (2023)This is until Go and Rust, which used automata-based regexes from the beginning. I don’t think users have lost much.Purely automata-based engines are kind of pleasant to write, because almost everything is in the compiler, and not in the runtime, e.g. <a href="https://github.com/andychu/rsc-regexp/blob/master/py/README.md">https://github.com/andychu/rsc-regexp/blob/master/py/README....</a>That is, features like ? + * really are syntactic sugar for repetition. There’s also a lot of syntax sugar around character classes like [^a], and the runtime is very small.---Likewise, SQL seems to have so many non-relational doodads in its language design, which cause problems for implementers. In this case, I think there’s an incentive problem with SQL: It benefits vendors if their dialect is harder to re-implement. Although certainly they’ve added many useful features too in 4-5 decades!To me a language design issue is we never really “learned” to compose languages with different paradigms:- the set-based paradigms like relational algebra and regular languages, with- Turing-machine like code. (and also I/O!)We never learned polyglot programming, so each language becomes its own source of “reckless growth” – its own parochial backwater.Both regexes and SQL should be able to “escape” to normal code, and that would greatly simplify them. This can be done both by language implementers and by application programmers, i.e. “factoring” across languages. It’s not always obvious how to do this, but it certainly it can be done more than we do it today.---I’d argue the same phenomenon – lack of language composition – leads to programming languages within YAML. Github Actions is nominally some kind of “declarative” scheduler specification, or graph (job -> job dependencies), but that’s not enough for many problems.So it also has a bunch of doodads for escaping that model (to the extent it has a model).Shell, Awk, and Make also grew many doodads (<a href="https://www.oilshell.org/blog/2016/11/14.html" rel="nofollow">https://www.oilshell.org/blog/2016/11/14.html</a>), which are not very well designed. They used to be declarative languages, but no longer are.Although there is some distinction between “formerly set-based languages” like SQL and regex, and other “declarative” non-Turing-complete languages. But I think the language composition problem is approximately the same. Part of it is syntax, but a lot of it is semantics.(copy of lobste.rs comment)