Medium-hard SQL interview questions

1230 pointsby thomzi12about 5 years ago

37 comments

minimaxirabout 5 years ago

SQL interview questions are an interesting counterpoint to stereotypical programming interviews: while typical algo questions in SWE interviews tend to test what's taught in academic contexts but have little real-world application; the questions in SQL interviews are more practical outside the whiteboard.A weakness of these types of SQL questions however is that it's near impossible for the interviewer to provide help/guidance; the questions are often know-it-or-don't questions (especially involving window functions w/ uncommon syntax). A data science interviewer years ago mocked me during a whiteboard test for not knowing the BETWEEN ROW syntax for a window function.That said, as an IRL data scientist, the amount of times I've had to do a SQL self-join in the past few years can be counted on one hand. And the BETWEEN ROW syntax for a window function.

评论 #23055169 未加载

评论 #23055765 未加载

评论 #23061254 未加载

评论 #23055225 未加载

评论 #23055546 未加载

评论 #23058242 未加载

评论 #23055730 未加载

评论 #23056054 未加载

评论 #23056349 未加载

评论 #23060468 未加载

评论 #23055398 未加载

评论 #23056766 未加载

评论 #23055139 未加载

danbmil99about 5 years ago

This is idiotic. Why in the world would testing for rote memorization of something anyone can look up easily be a reasonable filter for talent and experience in a programming role?A friend of mine did numerous interviews at a large company, hours out of his time and those of the interviewers, only to be caught up by some inane SQL question asked by a know-nothing after the entire process of interviews had been completed.Why not ask about obscure regex expressions? Better yet, how about baseball scores? Hair bands from the 80s?It's time for the valley to get real about how to judge the merit of applicants. The present state of affairs in tech recruiting is a joke.

评论 #23057081 未加载

评论 #23057204 未加载

评论 #23056508 未加载

评论 #23057707 未加载

评论 #23058254 未加载

评论 #23056662 未加载

评论 #23059081 未加载

评论 #23057190 未加载

deepsunabout 5 years ago

Checked just the first two answers:1. MoM Percent ChangeIt's better to use windowing functions, I believe it should be faster than self-join.2. It seems that the first solution is wrong -- it returns whether "a"-s parent is Root/Inner/Leaf, not "a" itself.I'd instead add a "has_children" column to the table, and then it would be clear.Second solution works, but without optimization it's 1 query per row due to nested query -- slow, but not mentioned.

评论 #23054959 未加载

评论 #23055410 未加载

评论 #23056715 未加载

评论 #23058046 未加载

fnord77about 5 years ago

I think for a lot of people, SQL is a skill that doesn't stick.You learn enough to do the queries you need for your project, they work then you forget about them as you work on the rest of your project. These skills are perishable. Left outer join? Yeah, I knew what that was some time ago, but not anymoreThe days of dedicated SQL programers are mostly gone.

评论 #23058361 未加载

评论 #23057544 未加载

评论 #23058265 未加载

评论 #23058086 未加载

评论 #23057129 未加载

评论 #23060970 未加载

评论 #23059095 未加载

评论 #23057071 未加载

Lightbodyabout 5 years ago

I think these are great. But I think there should be some representation around locking / concurrency / deadlock topics. Those tend to be the hardest because you can’t clearly recreate the right/wrong answer in a local test environment. Speaking as a person who waited far too long in his career to fully appreciate these topics, I wish I had been pushed to learn them much earlier.

评论 #23054323 未加载

评论 #23054625 未加载

thomzi12about 5 years ago

Hey, HN! Since I couldn't find a good resource online for the self-join and window function SQL questions I've encountered over the years in interviews, I made my own study guide which I'm now sharing publicly as a Quip doc. Would love your feedback or thoughts!

评论 #23054158 未加载

评论 #23054244 未加载

评论 #23054252 未加载

评论 #23054348 未加载

评论 #23054266 未加载

oyounabout 5 years ago

I think SQL is a language to be known by every programmer. With the right query, you may solve a problem that may take 100 lines in other languages.It is so usefull, reliable and does not change every year.

评论 #23056365 未加载

评论 #23056754 未加载

arh68about 5 years ago

For the second one, it seems most natural to reach for exists, or something (I have not tried this code..)<pre><code> select node , (case when parent is null then 'Root' when exists ( select * from tree c where c.parent = node ) then 'Inner' else 'Leaf' end) "label" from tree </code></pre> EDIT: also, in the fourth, it seems like you'd want to partition the window function, who cares about order. Something like<pre><code> sum(cash_flow) over (partition by date) "cumulative_cf"</code></pre>

评论 #23056432 未加载

ryanisnanabout 5 years ago

Great article. I can't help but feel like SQL is a poor choice for some of this stuff, though. More often than not, I find it much easier to pull the raw data into memory, and use a higher level language to do these sorts of queries. I am all for knowing the intricacies of SQL, as the cost for not can be very high, but I'm curious for your opinion here.

评论 #23054781 未加载

评论 #23057742 未加载

评论 #23055089 未加载

评论 #23054742 未加载

评论 #23055082 未加载

评论 #23059922 未加载

评论 #23054737 未加载

zozbot234about 5 years ago

No questions/examples featuring recursive CTE's? They tend to come up in anything involving queries over trees or graphs. They're also a relatively new feature where having some examples to show how they work may be quite helpful.

评论 #23054904 未加载

评论 #23055001 未加载

S_A_Pabout 5 years ago

I flip back and forth between deep diving in (my case) SQL Server skills and .NET Manipulation. In the world I live, it makes the most sense to do set based manipulation in SQL and logical entity based logic in C#. I work in a unique enterprise niche that has about 4 options based one either java or .net. Sql knowledge definitely gives you a leg up for complex reporting, and there are cases where I love being able to debug super quickly when comparing inputs to outputs. However, when I run into a SQL script that is 5000+ lines long and have to debug it, I much prefer the .NET side of the fence. Should someone ever come up with a bridge that gives you .NET level of visibility into the active datasets in a SQL query I would pay them 4 figures without question...

评论 #23057968 未加载

评论 #23059614 未加载

namdnayabout 5 years ago

Very interesting. I never really "got" declarative languages, I remember a very long time ago I was working with Oracle and you could see the "execution plan" for your SQL queries. I kept wondering "why can't I build my queries directly with this?" - it seems so much simpler to my brain than SQL itself.

评论 #23055578 未加载

评论 #23054642 未加载

评论 #23054330 未加载

评论 #23059179 未加载

vasilakisfilabout 5 years ago

I always thought that I suck in SQL but if these are medium to hard then I am not that bad actually.

评论 #23054413 未加载

评论 #23054397 未加载

gtrubetskoyabout 5 years ago

One problem with this article is the number of times the solution involves COUNT(DISTINCT).One of the best SQL interview questions is "Explain what is wrong with DISTINCT and how to work around it".

评论 #23054600 未加载

评论 #23057519 未加载

评论 #23054718 未加载

ridajabout 5 years ago

The first few answers are unidiomatic where I work. Analytical functions would be vastly preferable to self joins, especially in the case of the join with an inequality that is proposed to compute running totals, which I assume would have horrible performance on large datasets

hotsaucerorabout 5 years ago

RDBMS platforms without function indexes means that some of these queries will force a row-by-row execution over your entire table. Enjoy running SELECT ... FROM a INNER JOIN b WHERE DATEPART(a.date, month) = b.whatever on a table with 500 million rows in it.

评论 #23055032 未加载

ineedasernameabout 5 years ago

I think I'd be able to do all of these in my daily work, probably not as efficiently, with minor references to syntax guides (I don't use window functions often enough).In an interview, presumably my logic would, hopefully, shine through minor issues of syntax.Where would that put me? Maybe "okay to decent" when dealing with "medium-hard" questions?I would fail utterly at DBA management SQL and stored procedures, my responsibilities skew towards data analysis.

评论 #23055983 未加载

xupybdabout 5 years ago

For me the hardest part of the first question is understanding the acronyms. I think MoM is month on month. But MAU, no idea.

评论 #23054890 未加载

评论 #23055172 未加载

snidaneabout 5 years ago

SQL is better than 99% of the nosql alternatives out there.But one thing it falls apart are these time series data processing tasks.It's because of its model of unordered sets (or multisets to be more precise, but still undordered). When you look at majority of those queries and other real life queries they always involve the dimension of time. Well - then that means we have a natural order of the data - why not use an array data structure instead of this unordered bag and throw the sort order out of the window.SQL realized this and bolted window functions on top of the original relational model. But you still feel its inadequacy when trying to do simple things such as (x join x on x.t=x.t-1) or even the infamous asof joins where you do (x join y on x.t <= y.t).In array databases with implicit time order both examples are a trivial linear zip operation on the the sorted tables.In traditional set oriented SQL it results in horrible O(n^2) cross join with a filter or in the better case od equijoin in sort merge join which still has to do two expensive sorts on my massive time series tables and then perform the same trivival linear zip that the array processor would do. Which is also completely useless on unbounded streams.Also many stackoverflow answers suggest to use binary blobs for time series data and process them locally in your favorite programming language - which points at wrong conceptual model of SQL engines.Is SQL really so inadequate for real life data problems or have I been living under a rock and Kdb is no longer the only option to do these trivial time oriented tasks?

评论 #23058771 未加载

iblaineabout 5 years ago

>After an interview in 2017 went poorly — mostly due to me floundering at the more difficult SQL questions they asked meCould be you dodged a bullet. A company with advanced interview questions may have some ugly SQL code. For jobs that lean heavily on SQL, I expect candidates to know things like windowing & dynamic variables, in SQL & an ORM library. For SWE's, I feel basic SQL is fine.

评论 #23054745 未加载

评论 #23056997 未加载

ollienabout 5 years ago

This is neat, but it would be really helpful if these examples included some kind of sample output of what was expected.

评论 #23054203 未加载

评论 #23054229 未加载

nogabebop23about 5 years ago

The problem with hard SQL problems is that they are often one of two camps:1. use some underlying implementation detail of the particular RDBMS or proprietary extension to the standard2. are essentially tricks, like the typical ridiculous interview problem designed to "make you think outside the box". Yes, you can do almost anything in SQL but often you should not.I get the perspective here is data analysis where you probably need to know more SQL than the standard developer, but I still feel you should be testing for solid understanding of basics, understanding of the relational algebra concepts being used and awareness of some typical gotchas or bad smells. That's it. They'll be able to google the precise syntax for that tricky by date query when you're not guaranteed to have data for every month or whatever on-demand.

userbinatorabout 5 years ago

A bit off-topic, but this is another one of those sites that show nothing without JS, and looking at the source reveals something a little more amusing than usual: content that's full of nullness.

评论 #23055325 未加载

sk5tabout 5 years ago

I'd ding the over-use of CTEs, when subselects are often more appropriate and better-performing. Kind of a "every problem a nail" thing going on here.

评论 #23056738 未加载

评论 #23056212 未加载

评论 #23058460 未加载

评论 #23060835 未加载

cameronh90about 5 years ago

Out of curiosity: what is the use case for SQL window functions in application programming? Unlike most SQL, it doesn’t seem to reduce the order of the data coming back from the server, nor do anything especially faster than can be done on the client - and has the disadvantage of extra load on the database (which is harder to scale).Is it only useful for ad hoc/analytical queries, or am I missing something?

评论 #23057001 未加载

评论 #23057085 未加载

评论 #23057860 未加载

pier25about 5 years ago

Off topic but... anyone knows what they used for the rich text editor?It uses React but I imagine there is some other library like ProseMirror here.

评论 #23055368 未加载

nurettinabout 5 years ago

This is more like "we want to make sure you can use recursive CTE" questions. To add some variety to medium-"hard" SQL questions you could add some lateral joins, window functions (especially lag if you want to get creative) and compound logic statements in where clauses.

beckingzabout 5 years ago

For creating a table of dates, what are our thoughts on:select * from (select adddate('1970-01-01',t4.i10000 + t3.i1000 + t2.i100 + t1.i10 + t0.i) selected_date from (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0, (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1, (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2, (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3, (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4) v where selected_date between '2016-01-01' and now()This works in MySQL / MariaDB.

评论 #23062535 未加载

mosburgerabout 5 years ago

Worth noting that this isn't all ANSI-SQL... e.g. I'm pretty sure WITH is a Postgres thing?

评论 #23054609 未加载

评论 #23054633 未加载

评论 #23054563 未加载

评论 #23054637 未加载

gigatexalabout 5 years ago

Yup those questions are a stretch for me too! Love that though.Where are the pivot and unpivot questions?

评论 #23054978 未加载

dzongaabout 5 years ago

some of the problems with SQL, is it was written to solve problems when hardware was expensive. BCF, n all the normal forms etc. when doing analytics you want a flat table that's it. & when working with a flat table for analytics they're other tools better for analysis than sql e.g pandas. or sql like language used by column databases. once you've a flat table, you no longer have to do joins etc.

ojrabout 5 years ago

I’ve done my fair share of complex sql queries and complex data migrations, asking about JOIN during a random interview is unfair unless documentation.

cryptozeusabout 5 years ago

It would be helpful to show output result for each.

评论 #23054357 未加载

anonfunctionabout 5 years ago

The first solution for MAU has the wrong sign for the percentage change column:Previous MAU 1000 Current MAU 2000 Percent Change -100

评论 #23059194 未加载

revscatabout 5 years ago

Every time I see an article such as this it reminds me how much I deeply abhor SQL. It is an ugly language, closer in feel to COBOL than something that can at times approach elegance, like Ruby or Scala. With languages like those, you can loo at your work after you are done and be proud of it beyond its purely functional aspect. SQL never elicits a response beyond “the task is finished and it does what I want”, typically with a “finally” in there somewhere.

评论 #23054534 未加载

评论 #23054649 未加载

评论 #23054836 未加载

评论 #23054671 未加载

评论 #23055000 未加载

评论 #23055117 未加载

评论 #23054848 未加载

评论 #23054719 未加载

sadhana1234about 5 years ago

goood one

dangabout 5 years ago

We've changed the URL from <a href="https://gstudent.quip.com/2gwZArKuWk7W" rel="nofollow">https://gstudent.quip.com/2gwZArKuWk7W</a> to what that redirects to.