Why your query language should be explicit

44 点作者 hiphipjorge将近 10 年前

21 条评论

cwyers将近 10 年前

> In this query, we get all the users with the name 'jorge' are queried and then ordered in descending order by age.> SELECT * FROM users WHERE name = 'jorge' ORDER BY age;> If we wanted to dig deeper into this query, we might want to know if the "WHERE" is getting executed before the "ORDER BY". Can we tell from the query if this is the case? No, we can't. You'd have to look it up.No I wouldn't. I can look right at that query and tell you what order those two things occur in. The order makes sense -- of course you don't sort the results before you filter the results, the SQL database is not a moron. In fact, in this weird "explicit" query language, you have to remember the SAME ORDER -- put the WHERE in front of the ORDER BY (or the filter in front of the orderBy, in RethinkQL logic). Except if you forget the order, you can end up creating a query that performs many times worse than it has to. Whereas in SQL, even if you forget the most basic information about the query plan possible, the query planner will choose a pretty good execution strategy for your query.And if you can't figure out how that simple SQL query is going to be executed by the server just by reading it, why on Earth do you need a query language that does not just allow but requires you to to set your own query plan? The ability to shoot yourself in the foot isn't a feature.

评论 #9742362 未加载

评论 #9742743 未加载

评论 #9742491 未加载

评论 #9742768 未加载

geophile将近 10 年前

I don't agree with this conclusion. I've worked with Postgres, MySQL and Oracle, and found that it is important to have a good understanding of the possible execution plans for a query. I will sometimes construct a query very carefully to achieve a particular plan. And when things go wrong, I run EXPLAIN PLAN, examine statistics, etc., and tweak my query to do what I want. You really have to do that to obtain good performance in some cases. It does undercut the claim that SQL has to being non-procedural, but that's life.I do NOT want to have to construct the query execution plan manually for every single query. Usually, the optimizer will do a fine job, assuming the database designer has made good choices regarding indexes and other physical design issues. But there are always a few complex, tricky queries where you do have to understand internals. The "cognitive load" is the same, and having a high-level query language means that when you do make your subtle and deft change to rescue performance, you often just tweak a query, instead of rewriting a detailed query plan.(Extending this position a little: I have come around 180 degrees and consider ORMs to be a bad idea overall. When you do need to be very careful about writing a query, the ORM adds a layer of complexity -- not only do you need to control the SQL, but you now need to ensure that your ORM can produce that SQL.)

评论 #9742186 未加载

评论 #9742062 未加载

评论 #9741925 未加载

Xophmeister将近 10 年前

SQL is declarative: you tell the RDBMS what you want and its job is to do it. That's how SQL is and how it's always been. A better RDBMS will optimise the execution plan for you. The point about indexing is somewhat valid, but that's an integral part of schema design and something one should define from the outset based on the data's intended use.By 'explicit', what I presume the author to mean is 'transparent'. I agree that development processes should be transparent, but I don't necessarily agree that imperative is better than declarative. Indeed, the use case for SQL is data manipulation and analysis; arguably that doesn't come under the remit of 'development', even though programming is involved. Hence the prerequisite of a properly setup schema by someone who knows what they're doing!Declarative languages definitely have their place.

评论 #9742431 未加载

batbomb将近 10 年前

The biggest problem with a procedural/explicit query is a dynamic system. Without a query planner, you don't have the luxury of a system rewriting your queries. When Table A ~ Table B, but then Table B >> Table A, your queries are going to be radically suboptimal.Of course, if you're never joining, maybe that's not such a big problem, but you'd have the same issue of a specific range in your table grows disproportionally to another range in the same table and the index you are using is incorrect.With SQL you can often be very explicit using Common Table Expressions as well in DBs that support it. Otherwise, using subqueries, GROUP BY with HAVING, and several other features often, but not always, prevent radically rewritten queries.Finally, with SQL on many DBMSs you can still get the explicit wiggle room you need using optimizer hints. No, it's not very portable, but neither is ReQL.Edit: A few examples of being explicit in SQL (a lá Oracle):<pre><code> SELECT /*+ INDEX(name) */ * FROM users WHERE users.name = 'jorge'; WITH users_by_name AS ( SELECT /*+ INDEX(name) */ * FROM users WHERE users.name = 'jorge' ) SELECT * FROM users_by_name JOIN profile using (user_id) ORDER BY age;</code></pre>

评论 #9741902 未加载

评论 #9742227 未加载

rwallace将近 10 年前

He says he's been using the explicit query language for a couple of months. On that timescale, I can see how it might still feel okay. But as the months become years and your code grows in complexity to meet an ever lengthening requirements list, it should become apparent why SQL is far superior.Need to add an index to speed up some queries?In SQL, you add the index and you're done.In an explicit query language, you add the index and... oops, that doesn't do anything. You've got to go back and inspect every single query, anywhere in your program, that could potentially benefit from that index, to see whether it actually will, and if so, modify it by hand.Switching from SQL to an explicit query language converts certain types of programming effort from O(N) to O(N^2). This is one of the reasons SQL was invented in the first place.

kragen将近 10 年前

Jorge writes:> Does this increase cognitive load? Yes, it does. But this is outweighed by the ability to understand how your query is being executed. … Hence, when you see a query you immediately know that it's using an index…This reminds me of something I read in a paper once:> Accordingly, it provides a basis for a high level data language which will yield maximal independence between programs on the one hand and machine representation and organization of data on the other.It seems like this Jorge dude is claiming that it's great that, if you use his company's product, you have to change your program when you change the representation and organization of data on your disk, and that there are no real disadvantages to this. I think maybe he should read the paper I'm quoting from above, which is Codd 1970, introducing the relational database: <a href="https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf" rel="nofollow">https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf</a> — Codd explains why the 1960s CODASYL systems similar to RethinkDB made programs unmaintainable.If you don’t understand why relational databases got adopted in the first place, you aren’t qualified to “rethink databases” or to call yourself a “full-stack developer”. And your gullible customers, although they may get a prototype built quickly, will be outcompeted by their rivals who aren’t afraid of using query optimizers. Jorge must think we're all fucking idiots who don't know why we abandoned products like RethinkDB thirty or forty years ago.Fortunately, the HN thread is much more intelligent and informed than the original article!

jasode将近 10 年前

I don't agree with the essay's premise.>Having your query language be explicit means that you've hit at exactly the right level of abstraction: not too much, but not too little. I don't like the label of "explicit" as if it's an objective indicator on the continuum between low and high abstraction. It comes across as a value judgement that we'd all agree on and I don't think there's obvious consensus on architecting a data access language.>If we wanted to dig deeper into this query, we might want to know if the "WHERE" is getting executed before the "ORDER BY". Can we tell from the query if this is the case? No, we can't. You'd have to look it up.Not knowing the internals of execution is actually a deliberate design feature of SQL. The SQL is meant to be a declarative statement that expresses an algebraic set of rows. (But sometimes, the mathematical purity of this abstraction "leaks" and DBAs/Devs have to add HINTS or do SQL EXPLAIN PLAN to dig into what's happening under the hood -- but that's a separate issue.)I suppose if one really wanted to affect the order of operations at the SQL syntax level, one could write a VIEW or a subquery with the ORDER BY and then write the outer query with the WHERE clause. I haven't tested this to see if any of the major SQL engines would rewrite this type of convoluted SQL of ORDER BY -then- WHERE clause.Yes, with UNIX command line, you have different execution characteristics of "ls | grep | sort" vs "ls | sort | grep" but one can't translate that explicit-sequence-of-execution mental model to SQL.* Does this increase cognitive load? Yes, it does. But this is outweighed by the ability to understand how your query is being executed.I'm not convinced of this conclusion.Also, I'm not sure RethinkDB works like this as a deliberately engineered advantage. The RethinkDB devs can clarify but it's possible for their engine to work like this because it's more straightforward to implement the parser and not because there is overwhelming inherent superiority to this approach. With traditional SQL (e.g. Oracle, MSSQL, etc), the query rewriting engines are very mature and can be more aggressive and thereby fulfilling the goal of a declarative mathematical purity. However, it takes lots of programmer man-hours to translate declarative SQL into optimal execution plans.

jmileham将近 10 年前

This is a great way to spin not having a query planner as a feature, but I'm glad to have one every day that I write and compose semantic bits of SQL that can and should have different execution plans depending on the context in which they're evaluated.

评论 #9742114 未加载

mnarayan01将近 10 年前

As others have already mentioned, what the author is calling "explicit" would be more typically called "imperative". I'm going to go further though and say that I think "explicit" as used here is actually wrong. Take the example from the article:<pre><code> SELECT * FROM users WHERE name = 'jorge' ORDER BY age; </code></pre> and assume that we have exactly one index on the table, namely a compound one on `name, age`. If that index is used, then _only_ the filtering need be done. In SQL, because the execution sequence is left unspecified, we can continue to write the query as is while still allowing the DB to skip the unneeded ordering step (whether it does so or not is a different question obviously).If, however, the execution sequence must be "specified" (cf. "explicit") and you don't want to perform the unnecessary ordering step, then either the order must be left out of the query (and thus implicit), or the DB needs to be able to ignore what you tell it to do.

meritt将近 10 年前

No. Learn to trust the query planner. It's a good thing. When you want explicitness, take a look at the explain plan.

takeda将近 10 年前

I think this is a silly argument. It's much easier to write a database without query planner than with a query planner.The reason why you would want to have implicit language is because an optimal query might be different depending on what data you have and even what are you querying.For example if table has only 5 jorges it's probably better to use an index, but if majority of users are jorges or the table is very small it's far more efficient to just scan it.

orf将近 10 年前

I hate to criticize but this stood out to me:> SELECT * FROM users WHERE name = 'jorge' ORDER BY age;> Can we tell from the query if this is the case? No, we can't. You'd have to look it up.and then:> r.table('users').filter({ name: 'jorge' }).orderBy(r.desc('age'))> Now, can you tell from the query if the users are filtered or ordered first? Yes! filter comes first.The filter comes first in both queries. It's exactly the same.The part about indexes is interesting though.

评论 #9742583 未加载

TheLoneWolfling将近 10 年前

What I want is a language with feedback.In other words, a language that calculates all of the various optimizations behind-the-scenes, and sees what ones it thinks would be good and suggests them to you. And you can add annotations to allow it to do specific optimizations.It has control and transparency, but keeps it relatively easy to optimize. And you can hide the annotations if you really wish.

ksherlock将近 10 年前

This seems like a massive step backwards. And it's not for the benefit of the user (the programmer, she needs to do more work). It's for the benefit of the RethinkDB programmer (query planners are hard work!). Add an index in SQL? Existing queries work and can make use of it. Add an index in RethinkDB? Now go rewrite all your code if you want to take advantage of it. That's an improvement? (And do you think the average javascript programmer will do a better job at it than 30 years of database research?)If you're trying to spin shit into gold, maybe you should try a rethink preprocessor. Just write normal SQL in your code and the pre-processor verifies the tables and columns, checks for indices, and writes the best "explicit" query for you.

kylepdavis将近 10 年前

I think explicit languages can make things more clear however disagree with the notion that implicit behaviors are necessarily a bad thing.I've found that the hybrid approach in the MongoDB aggregation framework works really well.It optimizes things around the first $match to create an optimized initial read (the selectivity of your initial stages is really important). Once you're past the initial read the rest of the pipeline is fully imperative.This makes things really nice when debugging complex aggregation pipelines. For example, you can simply omit the rest of your pipeline at any point to debug (with a $limit), see what you're dealing with, fix them, and move on to the next one.

jtwebman将近 10 年前

Have you built any really big data projects with RethinkDB or just demo apps? There are some big advantages to having query planners and optimizers in the database. Also in SQL you can be explicit. You can tell it what index to use.

oconnor663将近 10 年前

> But RethinkDB won't optimize the query for you or tell you it's wrong. It'll just run it. It's up to the developer to understand what's going on and optimize accordingly. This might sound like a huge deal, but the simplicity of the language makes it easy to spot these inefficiencies and fix them accordingly.That's the kind of thing that's only true until it's not. The bigger a query gets, the less likely that you'll be able to eyeball it to see what you screwed up.

ris将近 10 年前

Besides the fact that I don't agree with his conclusion - he ignores any of the advantages of an implicit language and therefore the article suffers from the "everybody must be stupid" syndrome..."Jorge Silva, Dev Evangelist @ RethinkDB. Full-Stack JavaScript Developer."That final sentence makes me cringe slightly.

评论 #9742187 未加载

rusabd将近 10 年前

5 years later there will be an article about great new feature in XX-DB - a query planner.

jamesrom将近 10 年前

A lot of people are saying 'just trust the query planner'.The best of both worlds would be the ability to explicitly define how to get what you want just as easily as you can define what you want.That should be the goal.

评论 #9742809 未加载

DrScump将近 10 年前

"In this query, we get (sic) all the users with the name 'jorge' are queried and then ordered in descending order by age."Am I missing something, or should that say "ASCending" ?

评论 #9742805 未加载