SQL interview questions are an interesting counterpoint to stereotypical programming interviews: while typical algo questions in SWE interviews tend to test what's taught in academic contexts but have little real-world application; the questions in SQL interviews are more practical outside the whiteboard.<p>A weakness of these types of SQL questions however is that it's near impossible for the interviewer to provide help/guidance; the questions are often know-it-or-don't questions (especially involving window functions w/ uncommon syntax). A data science interviewer years ago mocked me during a whiteboard test for not knowing the BETWEEN ROW syntax for a window function.<p>That said, as an IRL data scientist, the amount of times I've had to do a SQL self-join in the past few years can be counted on one hand. And the BETWEEN ROW syntax for a window function.
This is idiotic. Why in the world would testing for rote memorization of something anyone can look up easily be a reasonable filter for talent and experience in a programming role?<p>A friend of mine did numerous interviews at a large company, hours out of his time and those of the interviewers, only to be caught up by some inane SQL question asked by a know-nothing after the entire process of interviews had been completed.<p>Why not ask about obscure regex expressions? Better yet, how about baseball scores? Hair bands from the 80s?<p>It's time for the valley to get real about how to judge the merit of applicants. The present state of affairs in tech recruiting is a joke.
Checked just the first two answers:<p>1. MoM Percent Change<p>It's better to use windowing functions, I believe it should be faster than self-join.<p>2. It seems that the first solution is wrong -- it returns whether "a"-s parent is Root/Inner/Leaf, not "a" itself.<p>I'd instead add a "has_children" column to the table, and then it would be clear.<p>Second solution works, but without optimization it's 1 query per row due to nested query -- slow, but not mentioned.
I think for a lot of people, SQL is a skill that doesn't stick.<p>You learn enough to do the queries you need for your project, they work then you forget about them as you work on the rest of your project. These skills are perishable. Left outer join? Yeah, I knew what that was some time ago, but not anymore<p>The days of dedicated SQL programers are mostly gone.
I think these are great. But I think there should be some representation around locking / concurrency / deadlock topics. Those tend to be the hardest because you can’t clearly recreate the right/wrong answer in a local test environment. Speaking as a person who waited far too long in his career to fully appreciate these topics, I wish I had been pushed to learn them much earlier.
Hey, HN! Since I couldn't find a good resource online for the self-join and window function SQL questions I've encountered over the years in interviews, I made my own study guide which I'm now sharing publicly as a Quip doc. Would love your feedback or thoughts!
I think SQL is a language to be known by every programmer.
With the right query, you may solve a problem that may take 100 lines in other languages.<p>It is so usefull, reliable and does not change every year.
For the second one, it seems most natural to reach for <i>exists</i>, or something (I have not tried this code..)<p><pre><code> select
node
, (case when parent is null then 'Root'
when exists (
select * from tree c
where c.parent = node
) then 'Inner'
else 'Leaf'
end) "label"
from tree
</code></pre>
EDIT: also, in the fourth, it seems like you'd want to <i>partition</i> the window function, who cares about order. Something like<p><pre><code> sum(cash_flow) over (partition by date) "cumulative_cf"</code></pre>
Great article. I can't help but feel like SQL is a poor choice for some of this stuff, though. More often than not, I find it much easier to pull the raw data into memory, and use a higher level language to do these sorts of queries. I am all for knowing the intricacies of SQL, as the cost for not can be very high, but I'm curious for your opinion here.
No questions/examples featuring recursive CTE's? They tend to come up in anything involving queries over trees or graphs. They're also a relatively new feature where having some examples to show how they work may be quite helpful.
I flip back and forth between deep diving in (my case) SQL Server skills and .NET Manipulation. In the world I live, it makes the most sense to do set based manipulation in SQL and logical entity based logic in C#. I work in a unique enterprise niche that has about 4 options based one either java or .net. Sql knowledge definitely gives you a leg up for complex reporting, and there are cases where I love being able to debug super quickly when comparing inputs to outputs. However, when I run into a SQL script that is 5000+ lines long and have to debug it, I much prefer the .NET side of the fence. Should someone ever come up with a bridge that gives you .NET level of visibility into the active datasets in a SQL query I would pay them 4 figures without question...
Very interesting. I never really "got" declarative languages, I remember a very long time ago I was working with Oracle and you could see the "execution plan" for your SQL queries. I kept wondering "why can't I build my queries directly with this?" - it seems so much simpler to my brain than SQL itself.
One problem with this article is the number of times the solution involves COUNT(DISTINCT).<p>One of the best SQL interview questions is "Explain what is wrong with DISTINCT and how to work around it".
The first few answers are unidiomatic where I work. Analytical functions would be vastly preferable to self joins, especially in the case of the join with an inequality that is proposed to compute running totals, which I assume would have horrible performance on large datasets
RDBMS platforms without function indexes means that some of these queries will force a row-by-row execution over your entire table. Enjoy running SELECT ... FROM a INNER JOIN b WHERE DATEPART(a.date, month) = b.whatever on a table with 500 million rows in it.
I think I'd be able to do all of these in my daily work, probably not as efficiently, with minor references to syntax guides (I don't use window functions often enough).<p>In an interview, presumably my logic would, hopefully, shine through minor issues of syntax.<p>Where would that put me? Maybe "okay to decent" when dealing with "medium-hard" questions?<p>I would fail utterly at DBA management SQL and stored procedures, my responsibilities skew towards data analysis.
SQL is better than 99% of the nosql alternatives out there.<p>But one thing it falls apart are these time series data processing tasks.<p>It's because of its model of unordered sets (or multisets to be more precise, but still undordered). When you look at majority of those queries and other real life queries they always involve the dimension of time. Well - then that means we have a natural order of the data - why not use an array data structure instead of this unordered bag and throw the sort order out of the window.<p>SQL realized this and bolted window functions on top of the original relational model. But you still feel its inadequacy when trying to do simple things such as (x join x on x.t=x.t-1) or even the infamous asof joins where you do (x join y on x.t <= y.t).<p>In array databases with implicit time order both examples are a trivial linear zip operation on the the sorted tables.<p>In traditional set oriented SQL it results in horrible O(n^2) cross join with a filter or in the better case od equijoin in sort merge join which still has to do two expensive sorts on my massive time series tables and then perform the same trivival linear zip that the array processor would do. Which is also completely useless on unbounded streams.<p>Also many stackoverflow answers suggest to use binary blobs for time series data and process them locally in your favorite programming language - which points at wrong conceptual model of SQL engines.<p>Is SQL really so inadequate for real life data problems or have I been living under a rock and Kdb is no longer the only option to do these trivial time oriented tasks?
>After an interview in 2017 went poorly — mostly due to me floundering at the more difficult SQL questions they asked me<p>Could be you dodged a bullet. A company with advanced interview questions may have some ugly SQL code. For jobs that lean heavily on SQL, I expect candidates to know things like windowing & dynamic variables, in SQL & an ORM library. For SWE's, I feel basic SQL is fine.
The problem with hard SQL problems is that they are often one of two camps:<p>1. use some underlying implementation detail of the particular RDBMS or proprietary extension to the standard<p>2. are essentially tricks, like the typical ridiculous interview problem designed to "make you think outside the box". Yes, you can do almost anything in SQL but often you should not.<p>I get the perspective here is data analysis where you probably need to know more SQL than the standard developer, but I still feel you should be testing for solid understanding of basics, understanding of the relational algebra concepts being used and awareness of some typical gotchas or bad smells. That's it. They'll be able to google the precise syntax for that tricky by date query when you're not guaranteed to have data for every month or whatever on-demand.
A bit off-topic, but this is another one of those sites that show nothing without JS, and looking at the source reveals something a little more amusing than usual: content that's full of nullness.
I'd ding the over-use of CTEs, when subselects are often more appropriate and better-performing. Kind of a "every problem a nail" thing going on here.
Out of curiosity: what is the use case for SQL window functions in application programming? Unlike most SQL, it doesn’t seem to reduce the order of the data coming back from the server, nor do anything especially faster than can be done on the client - and has the disadvantage of extra load on the database (which is harder to scale).<p>Is it only useful for ad hoc/analytical queries, or am I missing something?
Off topic but... anyone knows what they used for the rich text editor?<p>It uses React but I imagine there is some other library like ProseMirror here.
This is more like "we want to make sure you can use recursive CTE" questions. To add some variety to medium-"hard" SQL questions you could add some lateral joins, window functions (especially lag if you want to get creative) and compound logic statements in where clauses.
For creating a table of dates, what are our thoughts on:<p>select * from
(select adddate('1970-01-01',t4.i<i>10000 + t3.i</i>1000 + t2.i<i>100 + t1.i</i>10 + t0.i) selected_date from
(select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
(select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
(select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
(select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3,
(select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4) v
where selected_date between '2016-01-01' and now()<p>This works in MySQL / MariaDB.
some of the problems with SQL, is it was written to solve problems when hardware was expensive. BCF, n all the normal forms etc. when doing analytics you want a flat table that's it. & when working with a flat table for analytics they're other tools better for analysis than sql e.g pandas. or sql like language used by column databases. once you've a flat table, you no longer have to do joins etc.
I’ve done my fair share of complex sql queries and complex data migrations, asking about JOIN during a random interview is unfair unless documentation.
Every time I see an article such as this it reminds me how much I deeply abhor SQL. It is an ugly language, closer in feel to COBOL than something that can at times approach elegance, like Ruby or Scala. With languages like those, you can loo at your work after you are done and be proud of it beyond its purely functional aspect. SQL never elicits a response beyond “the task is finished and it does what I want”, typically with a “finally” in there somewhere.
We've changed the URL from <a href="https://gstudent.quip.com/2gwZArKuWk7W" rel="nofollow">https://gstudent.quip.com/2gwZArKuWk7W</a> to what that redirects to.