My favorite coding question to give candidates

138 pointsby davidstover 1 year ago

55 comments

hornbanover 1 year ago

Asking candidates to come up with this kind of solution in an interview setting where they are under all kinds of pressure is honestly dehumanizing.There's a lot of good insight in the article about the correct way to approach the problem, but asking anyone to come up with it on the spot is unrealistic. You have the benefit of having seen the problem before with time on your side to reflect on it. They haven't.When I do interviews like this, I prefer to talk them through the problem together, like we were actual teammates working on a problem together. That more closely relates to life on the job, which to me is the point of interviewing someone.

评论 #38258746 未加载

评论 #38259195 未加载

评论 #38258875 未加载

评论 #38261713 未加载

评论 #38266503 未加载

评论 #38259536 未加载

评论 #38260639 未加载

评论 #38259089 未加载

zzyzxdover 1 year ago

> After a candidate puts forth the O(n²), I smile politely and I wait. I am really hoping the next words that come out of their mouth are “…but the complexity of this is O(n²) so can I do better?”> Occasionally, a candidate will think they’re done at this point. 90% of the times that a candidate was done without questioning the quadratic nature of that solution, the final outcome of the loop was No Hire. So that’s another signal for me.I would have been one of such candidates. The author said they didn't like tricky questions and wanted to get a signal on how the candidate may approach real world problems. Well this is indeed tricky -- unless you drop a bunch of constraints in the beginning, for a real world project, I would just use all the resources I can access to finish it. I am not going to go the extra miles to optimize it in all possible ways. Premature optimization can be evil. I provided the solution, it works and meets all your requirements, then I am done.Want me to make it fast/memory efficient? You have to say it. Forgot to mention it in the first iteration? No problem, cut me a ticket and I will see if I can sneak it into my next sprint.

评论 #38261461 未加载

评论 #38261438 未加载

评论 #38263142 未加载

评论 #38261186 未加载

评论 #38272322 未加载

评论 #38262450 未加载

评论 #38266459 未加载

评论 #38264809 未加载

clnqover 1 year ago

> No great engineer should ever settle for an O(n²) algorithm, unless bound by memory or some other unmovable constraint.What if this is a one-off to produce a business report? Would it make sense to use programmer time to create an O(n) structure in memory, or just loop through the files line by line and let the CPU take a minute or five, or thirty? What is the programming language - something that has a library for this or something very low level where we’d read the file byte by byte?If we’re dealing with the latter, a small amount of data, and a one off report, I don’t care at all in my work whether an engineer I’m managing somehow writes it in O(n^3).It’s interesting how quick to judge the author is - ask this question for points, don’t even think about that, don’t mention arrays because they’re fixed size (despite implementations for dynamically allocated arrays totally existing and the candidate might be coming from that), and so on. Some humility would be nice.Although I think what they wrote is very valuable, as this is how many interviews go. And I have to at least appreciate the author’s approach for trying to start a conversation, even if he still takes a rather reductive approach to evaluating candidates.

评论 #38259692 未加载

评论 #38258949 未加载

评论 #38258866 未加载

评论 #38261431 未加载

评论 #38260003 未加载

Izkataover 1 year ago

> Can’t I just use a Database?> In theory you could write a pretty simple SQL query, and sure, Big Tech companies have giant data warehouses where you can easily do this sort of thing. But for the scope of a coding interview, you wouldn’t want to. Since this is not a distributed systems problem and the data fits in memory, why introduce the additional complexity and dependencies of a database for something that you can solve with 20 lines of simple code?My first thought on an actual implementation was, if this is a one-off request, to import it into sqlite. No need to set up a big system, and I think it would be easier/faster than writing those 20 lines of code. Also a hell of a lot easier to iterate on minor spec tweaks like the unique pages overall vs per day clarification. And probably less likely to have off-by-one type of bugs, since the simple logic is handled by the database itself. Bonus, it does handle the case where the dataset is larger than memory!

评论 #38260676 未加载

评论 #38260257 未加载

评论 #38266229 未加载

rf15over 1 year ago

> I don’t like hard or tricky questions....> I’ve actually expressed the problem in an ambiguous way.So when other people do it, hard and tricky questions are bad, but when you deliberately set your candidate up for failure by withholding concrete information, that's clever and insightful. Got it.Or more productively put: The author obviously enjoys tearing down simple questions with complex implications (one often does in the sw field) and reflects over their candidates, but seemingly lacks the self-reflection to understand what makes questions hard or tricky and why interviewers like to pick them.

评论 #38263740 未加载

评论 #38261164 未加载

评论 #38261830 未加载

Mawrover 1 year ago

Here we see the most classic interviewing error: not understanding that there's a difference between a test and what's being tested.Will the data fit in memory? Well, of course it will, it's an interview... oh you expected me to ask you anyway?I should obviously load both files into the hashmap, that way it works for an arbitrary amount of files instead of just two... oh, you expected me to write a solution for literally the exact problem you stated without considering its practicality? Even though before you wanted the opposite, when you asked about the algorithmic complexity?Guess I'm failing.

评论 #38261504 未加载

qsortover 1 year ago

The three solutions (n^2 time 1 space, n lg n time k space and n time n space) are basically the three strategies to perform a join in a relational database: a full table scan, a merge-join and a hash-join respectively."explain select" is a cool source of interview questions :)

评论 #38258832 未加载

评论 #38259741 未加载

interactivecodeover 1 year ago

You know since this assignment is basically a boss asking for a single report once about "loyal customers" complexity doesn't matter. let's say worst case scenario it takes 30min to run. who cares about complexity the business value is in the answers from the data.If you're consistently getting reliable answers and finally decide to build a system for these types of reports, clearly this guy's real world experience at Amazon's Clickstream product is going to be far more valuable than what ever anyone who is brand new to the problem can come up with, even if they choose the "correct" algorithm from the start.Because I bet you that for most real world products that create more than a single fixed format report you actually want your data setup in a completely different way than what you initially thought. You'll probably learn for example that you want to aggregrate data per week instead of per day. or perhaps you want to link this data to an internal users database, or perhaps your boss wants a notification when new data is added. Or perhaps you'll learn that loading it into a single 1GB SQLite DB solves your problem without even needing to think about any algos.

评论 #38266846 未加载

michaelteterover 1 year ago

And my favorite answer to questions like this is, "Can I just use grep (or shell commands in general)?"Grep, uniq, wc, and a few others can be treated as pipeline data transformers to answer questions like this interview question. As long as you make some smart decisions about the order of operations, you can usually get performance on par with what you might write custom code for.

评论 #38262382 未加载

评论 #38261864 未加载

评论 #38266766 未加载

CSMastermindover 1 year ago

Reading that made me happy I'm not an IC anymore.Don't get me wrong - it's a totally fair question, frankly one I would have been happy to receive when I was interviewing for those roles.I'm also a fan of whiteboarding coding interviews in general as a way of evaluating talent so no objections there.There were just something about this specific question that just struck me as boring, souless, like who cares? I think my objection might be that it too closely resembles a menial task I might actually be given - something that I hope to God the upcoming LLM advances automates away.

评论 #38258606 未加载

johnfnover 1 year ago

OP's solution seems a little clunky to me. He does this thing where he loads the first day into a hashmap, and then he queries it as he loops over the second day. But why? That just overcomplicates your algorithm, and the memory used by O(2) days is roughly the same as O(1) days. It's also overly specialized to solving the very precise problem that's posed in the interview; what if tomorrow Big Boss wants you to aggregate over 3 days?A cleaner solution would be to load both days 1 and 2 into the same hashmap. Then you can iterate the map and count whatever condition you want.

评论 #38266189 未加载

评论 #38260641 未加载

eachroover 1 year ago

I've become partial to interview questions where the interviewee just has to build something like a rock paper scissors game or command line to do list app. Simple prompt, easily extendable as well. It's jarring how many people with 5+ years of experience completely fail on this kind of interview.Any experienced engineer should have no trouble with it. There's no hiding here. - candidate just needs to deliver something working, something relatively clean, and be reasonably pleasant to pair with. No leetcode grinding necessary, though I have found that those who did well on this problem also generally got high scores from my colleagues who do ask LC questions.

评论 #38268283 未加载

flashgordonover 1 year ago

Ah yes where have I heard that "oh my question is so non tricky that you just have to think rationally and I am only looking for your willingness and ability to converse and you will do fine. But most people fail it because they can't have normal conversations"."Let us hire this person because they dint solve the problem but were a great conversationalist problem solver" - said nobody at a standardized hiring committee!

评论 #38260672 未加载

throwaway81523over 1 year ago

It wouldn't surprise me if the O(n log n) sorting solution is faster than the O(n) hashing solution, because of better memory locality.The first answer that popped into my head was a shell pipeline, "cat file1 file2 | sort -k [pattern for customer ID] | awk -f ..." where the awk part just scans the sort output and checks for both dates and two pages within each customer-ID cluster. So maybe 10 lines of awk. It didn't occur to me to use hash tables. Overall it seems like a lame problem, given how big today's machines are: 10,000 page views per second for 2 days, crunching each record into 64 bits, means you can sort everything in memory. If 10 million views per second then maybe we can talk about a hadoop cluster. But 10k a second is an awfully busy site.I actually had a real-life problem sort of like this a while back, with around 500 million records, and it took a few hours using the Unix command line sort utility on a single machine. That approach generally beat databases solidly.

评论 #38269644 未加载

anotherpaulgover 1 year ago

For fun, I fed this interview question to GPT-4 with aider. See the chat transcript linked below.The data structures look sensible and it did most of what the interviewer wanted on the first try.It did make the wrong initial assumption that we wanted 2 unique pages per day. When prompted with a clarification, it made a sensible fix.When asked to optimize, it went for big hammers like parallel processing and caching. As opposed to saving memory by only storing one file in the data structure as the author discussed.<a href="https://aider.chat/share/?mdurl=https://gist.github.com/paul-gauthier/6ae6feabe0b18decf6ee39c5377343ce" rel="nofollow noreferrer">https://aider.chat/share/?mdurl=https://gist.github.com/paul...</a>

greatgibover 1 year ago

What is very is funny is that, in all big and small companies I have worked in or with, no one ever use or discussed BigO. It only shows up in interviews technical tests.Lots of good software engineers will have the instinctive knowledge of good or costly solutions without mapping it to BigO.Also, what is funny is that, in my opinion, BigO is used and required by people that want to look smart but are not necessarily so. Because what we do with bigO is really limited. Almost nowhere the discussion will go further than O(1), O(logn), O(n), O(n2). Because after it becomes hard to understand maths. But in my opinion, algorithm complexity goes way beyond that when you use it in real life.

评论 #38264170 未加载

评论 #38263834 未加载

评论 #38274854 未加载

traversedaover 1 year ago

>Poor candidates load the contents of both files into memory.I suppose this is the step where I become a "poor candidate". I think it's important to acknowledge changing client requirements at this point. Sure, loading both files in to memory is less memory efficient, but it's much easier to tweak this algorithm later if you do it. You can change to to count over 3 different days, or 2 days in a 5 day time period, or any number of other things. You can save some memory if you don't, but you'll arrive at a solution that is much less flexible.I mean the real solution is to load all the data into a database of course, but even given the constraints of the problem I'd still argue for loading each entire file in to memory as the more general and flexible solutions, when our pretend clients inevitably change their pretend minds.>you don’t need to actually keep every single page from Day 1 in the Map, just two, since the problem is “at least two pages” so a Set of size 2 or even an array of size 2 will use less memory than an unbounded Set.And with this I think we've crossed over from the practical to leetcode. At this point you're asking the candidate to add a bunch of new code paths (each one should be tested) and make their solution a lot less general. Going from a pretty general algorithm that can be tweaked pretty easily to something super specific with a bunch of new sources of bugs.>Or, if you’ve already determined that a customer is loyal, you don’t need to waste CPU cycles going thru the logic again next time you encounter that customer in Day 2.No, please load it all in to your data structures properly, even if you "waste" a bit of time. All these weird little conditionals sprinkled throughout your code when you're ingesting the data are going to be sources of problems later. They might save a bit of memory, a few cycles, but they significantly increase the complexity of the code, and make a refactor of tweaks much much harder.If this developer started doing stuff like that in an interview with me, well it would raise some red flags.>If you want to make the problem even more interesting, you can add a third file. I will leave that as an exercise for the reader as well!See, our imaginary customers imaginary minds did end up changing. Bet you wish you had loaded both files into memory now.

Nikskoover 1 year ago

It's a good question, and the author explains it and the logic really well.As someone going through this style of interview at the moment (but not having interviewed at Google, Microsoft or Amazon), two things jump out at me:- If you're going to ask this question and get it done in 1 hour, does the code really matter? I'd argue that if you can get to a good or optimal solution, 99 times out of 100 you can write the code. If I got this question and didn't know better, I'd be stressing about writing the code within an hour. Knowing that we wanted to spend most of the time discussing the algos and data structures would be really useful to me. Maybe Google/Amazon/Microsoft interviews really stress this in their preamble, I don't know.- The big "issue" I see with this question is that it relies on the interviewer knowing exactly how to steer the conversation. I think I could get to this solution with the hints, and the author seems to imply that it's ok to need a few hints. But an interviewer that doesn't know the right hints to give (or phrases them poorly) is going to turn this question into a train-wreck. This isn't an issue for the author, they clearly know this questions backwards and forwards. But giving this question as a 'standard' question that others will deliver? I think it could easily end up being too conservative and cutting out a lot of otherwise smart developers.In general, that's my criticism of this style of question: they all claim that they're about 'seeing how you think'. But I think expecting interviewers to be able to elicit a conversation that really shows 'how a candidate thinks' is much more on the interviewer rather than the interviewee. You're expecting people whose primary job is writing software to be really good at delivering interviews.Instead, you're going to have candidates who most of the time will do well if they can pattern-matching against problems they've seen in the past, and poorly otherwise. I can see how questions like this seem good on paper, and I'm glad this question works for the author. But it's the combination of interviewer and question that makes it effective, not just the question alone. A better title for this post might be 'My favourite way of interviewing candidates', because this post is mostly to do with the author's mental model of how to run an interview with this question.

AndyNemmityover 1 year ago

Well, you're holding it in a log file. Either the data is important enough we'd like to keep it, so we should put it in a database or elasticsearch, or the data isn't important enough, and I'd like to get further clarity on what you're trying to achieve.... I'm guessing I didn't get the job.

avmichover 1 year ago

Yeah, I'm disagreeing with some approaches. For example, in real life the engineer usually knows the constraints of the problem, and can add them himself. More, asking clarifying questions regarding memory versus speed will quite often draw blanks from less-technical consumer. Some aspects of the problem seem artificial - do we need exactly two days visits to be loyal? Exactly two unique pages? What one's going to do tomorrow, when these requirements change? And that would affect the design chosen.I feel that author filtered out lots of great candidates over this problem, which might be something to pause about. On the other hand, interviewing to get a good signal is indeed a tricky business, so I can sympathize.

josephgover 1 year ago

> Map<CustomerId, Set<PageId>> will doYou can do a little better than that. Each item in your map has exactly 3 states:- We’ve seen this customer visit one unique page with (xx) url on the first day- We’ve seen this customer visit two unique pages - but only on the first day.- We’ve seen the customer visit one unique page (xx) and they’ve visited on both days.In the second state you don’t actually care what the URLs are. And I think the logic for the 3rd state is identical to the logic for the 1st state - since you only add them as a “loyal customer” by visiting them again on the second day. So I think you can get away with using an Option<Url> to store the state instead of a Set or a list. (Though I’d probably use a custom parametric enum for clarity).It’s a great problem to model using a state machine.

评论 #38261653 未加载

croesover 1 year ago

Did any of the hired programmers for Amazon work at the Amazon search?I don't mind O(n²) if I get good result but Amazon's search seldom gives me good results.Same with Microsoft's search in the start menu. Doesn't find Excel if I type "exc".

评论 #38258598 未加载

评论 #38258518 未加载

评论 #38260093 未加载

评论 #38258727 未加载

RagnarDover 1 year ago

What if a candidate notes that web hits can come from all parts of the planet and every timezone and therefore "same day" for a particular end user does NOT overlap with the "same day" of the log files and thus immediately throws into question the meaning of '(a) they came on both days'. Many users could visit the site multiple times in one of their days, but recorded as two separate days in two separate log files. This certainly adds some complexity to the question and might not even be something the interviewer considered in the first place.

评论 #38261860 未加载

评论 #38266496 未加载

评论 #38268420 未加载

pjotover 1 year ago

I think these kinds of problems cause interviewee’s (under stress) to overthink the solution.“Load to a relational store and use sql” would be a reasonable answer that, I’m sure, would be acceptable in most cases.

评论 #38262466 未加载

j-pbover 1 year ago

<pre><code> Sorting the files can be a logarithmic operation with constant memory. </code></pre> A great candidate would know that this is a log-linear operation O(n*log(n)), not a logarithmic one O(log(n).

donatjover 1 year ago

I mean in my book anyone doing this in anything other than a bash one liner leaning on Awk is overbuilding.

评论 #38259224 未加载

missblitover 1 year ago

The rationale for not wanting SQL solution feels a bit strained. If you don't want an SQL answer just say "Hey buster this is a C++ interview!"

评论 #38260577 未加载

yardstickover 1 year ago

How about a single map to a structure that contains a bitmap for days visited, and a bloom filter for pages visited.Very low memory and compute requirements.Map<CustomerId, Metadata>Where Metadata is a dateVisited bitset plus a bloom filter (32/64 bits depends on how accurate you need it to be).

antisthenesover 1 year ago

I agree with most points in the article except for the database part.> Since this is not a distributed systems problem and the data fits in memory, why introduce the additional complexity and dependencies of a database for something that you can solve with 20 lines of simple code?Because this is a question about getting the right data, and SQL Databases are...extremely good for filtering, sorting and grouping... data. Besides, every page visit from every client is a unique observation, and the principle of...tidy data suggests that every observation use a database row.Why solve this with 20 lines of code, when you can solve it in a 4 line SQL query?

phendrenad2over 1 year ago

The venn diagram of people who blog about their coding interview questions and the people I want to interview with does not overlap. Nothing personal, but they always come across as "here's how I make monkeys dance for my amusement" or "here's a clever question and the more people ask me about it in the interview (despite it being perfectly clear to begin with) the more it strokes my ego and the more I am likely to interview them"

mattkenefickover 1 year ago

Job interviews like this are terrible. It's a shame he's put so many people through it.

happytigerover 1 year ago

This is a great way to hire people who all meet a homogenous standard of behavior. It’s absolutely efficient from an engineering standpoint to eliminate weak links who can’t solve basic engineering problems.It creates a team of people who are good at tests, and good in testing environments.However, there are so many things that make an engineer great that have nothing to do with how they solve problems but who they are, how dedicated to improvement they are, etc.But they may not be someone who:-thinks quickly on their feet - finds this type of situation tolerable - may have disabilities one can’t see that would make this kind of interview difficult for them - have personality challenges or anxiety in social situations that make interviews like this impossibly difficultThe list of reasons that “whiteboard testing interviews” don’t work well is long.I don’t think there’s anything wrong with this approach if that’s the kind of organization you want to build. But it does tend to create homogeny and act as a gatekeeper for “those who do not fit in.”Some of the very best engineers I have ever hired would never make it though this interview. But they were amazing engineers who did world class work.

shepherdjerredover 1 year ago

This article mentions "good" and "great" candidates many times.How is the author determining which candidates are great? Do "great" candidates answer the questions the best, or is the interviewer following up 1-2 years after hire and examining the impact of the person?Great candidates aren't those who can answer DS & algorithms questions the best, but it seems that the author thinks that way.

amlutoover 1 year ago

> For example, you don’t need to actually keep every single page from Day 1 in the Map, just two, since the problem is “at least two pages” so a Set of size 2 or even an array of size 2 will use less memory than an unbounded Set.That seems overcomplicated. For each customer on day 1, you either have multiple pages or you have a single page. If you see them on day 2 and they had multiple pages on day 1, then they are loyal. Or if they had a different page on day 1 than day 2, they’re loyal. (Or two different pages on day 2, but this comes along for free.)So the data structure can be:map<customerid, (MultiplePages | pageid)>Where MultiplePages is choice in a sum type that doesn’t store any associated data. Or you can do:map<customerid, optional<pageid>>Where the none state of the optional means there are multiple pages, but this is a bit of an odd use of an optional.

zdwover 1 year ago

You could get to something working and relatively bug free in 15 minutes with a shell script consisting of little more than cut, head, sort -u, and grep.For reference: <a href="http://www.leancrew.com/all-this/2011/12/more-shell-less-egg/" rel="nofollow noreferrer">http://www.leancrew.com/all-this/2011/12/more-shell-less-egg...</a>

评论 #38258688 未加载

评论 #38259006 未加载

评论 #38258978 未加载

corethreeover 1 year ago

Easy.1. Find names appearing in both log files. From this create a Set of customers that appear in both. In python it's just creating a set from the first file, creating another set from the second file and unionizing them. O(N)2. concatenate both files to treat as one. create a Map with key: Customer and value: Set[Page]. This is basically iterating through the log, when you see a customer append the customer_id to the map and add the page to the set if it already exists. O(N)3. Filter the map for all customers with length(Set[Page]) > 1. To get the Set of all customers that visited more than one page. O(N)4. Combine the sets of customers who visited multiple pages with customers that appeared in both log files into a new set. O(N). You can do this with, again the union operator in python.The irony is I'm more able to do this in python then I am in SQL. For SQL stuff at work I just look it up. I never remember the exact syntax. Total runtime O(N) total memory O(N)This is just your basic hashmap stuff for fast look up and aggregation. Usually fang questions are harder than this.

评论 #38260478 未加载

andrewstuartover 1 year ago

If the job is log file processing, it's an entirely reasonable question.But most jobs are not log file processing.It's ridiculous to generalise this sort of thing:"We are cooking soup here at Amazon Soup Kitchen. My favorite interview question is to ask candidates to bake a cake, that's the real test of any cook."

评论 #38270215 未加载

chiefalchemistover 1 year ago

First they say they don't like tricky questions, and then goes on to admit the "spec" is ambiguous. True, candidates are permitted to ask questions, but perhaps they are trusting of the interviewer and expect the question to be actionable as is? Or just the same, the candidates don't trust the interviewer and don't ask questions because they fear that'll result in a penalty? If you come from an environment where questions aren't rewarded - and there are plenty of those - then silence is likely.Finally, it's worth mentioning, while the question + answer might correlate well with the hiring decision there's no mention to how well it predicts future performance. That said, there's a survivor bias at play so using it against performance might be iffy.

评论 #38258479 未加载

评论 #38258441 未加载

评论 #38259047 未加载

评论 #38258784 未加载

评论 #38258536 未加载

评论 #38258180 未加载

laurent_duover 1 year ago

I am baffled that several contributors to this thread seem to find this question difficult, some even calling it "dehumanizing". This is a very easy and basic question and I wouldn't want to work with someone who couldn't solve it efficiently in a few minutes.

评论 #38266450 未加载

coolThingsFirstover 1 year ago

The problem is that once you start asking algorithm questions for top tech companies people will optimize for knowing them deeply instead of exploring other fields of CS.This is what happens with Codedorces/ACM-ICPC. Suddenly everyone is hyper driven to crank them out day after day under pressure and much more interesting fields databases or turning a business idea into a usable app get neglected.Lets not fool ourselves no one is going to solve hard medium LC under time pressure in 30 minutes unless they’ve seen a similar problem before which leads to hiring worse engineers who pass interviews.Once a metric becomes the goal it ceases to be a good metric.

评论 #38271796 未加载

svilen_dobrevover 1 year ago

i have a ~~similar task in my python course.. given 2 text files with similar format, one holding persons and cds they have each, another holding songs and which cds they are on, to answer which songs particular person has, and which persons has particular song. Yeah, a join.Nothing about O(blah) though, these are way-too specialized / optimizing lands.that said, coding != thinking..Some people cannot think of a solution at all, 50% go for 3 loops one-in-another.. copy-pasted 2x2 times, few do the ~~simplest 2 maps with 2-3 funcs.. and once a "master" invented a class with quite a few methods contaning copy-paste inside :/ Probably one can do some analysis over who goes which way and why but i never bothered.It is also a good ground to show the different ways to add key-value to a map if it's not there - and their pros/cons (one can hook some O(blah) stuff here. i don't. There's just simpler vs faster, with variants).And yes, having an (subtly) unclear requirement (that should be identified and asked about), is important part of the learning - that's 90% of cases in life.

animal531over 1 year ago

Data structure questions heavily favors previous use. For example in the past I've used a multi-dictionary for certain problems, so it would have been an easy reach for me; but maybe not for someone from a different coding history.Personally I prefer something like fizzbuzz which is a pure code question, it applies to candidates of all levels and tells you if they can reason through problems.

评论 #38261626 未加载

ZoomZoomZoomover 1 year ago

The question looks more or less OK, except the ambiguity part. Hower, perhaps after "25 years in Big Tech" one shouldn't invent a process that discriminates against "loyal" customers living in timezones unaligned with the logging server. As usual, users come last.

评论 #38261890 未加载

two_handfulsover 1 year ago

Nice question. Reminds me of the famous saying: “never underestimate the power of sorting.”

gpderettaover 1 year ago

You do not need to sort for the preprocessing based solution, you only need a K-way partition, which if I'm not mistaken, it can be done in two linear passes. I don't remember if it can be done in place easily though.

ozimover 1 year ago

Love how he goes, he does not like tricky questions and first thing he describes is a trick hidden in the question designed specifically to fool candidates.Yet he still thinks it is not a tricky question.But great article and I learned something from it.

评论 #38260479 未加载

pharmakomover 1 year ago

Just chuck it in SQLite and move onto the next business problem. 20 mins, tops.

rakooover 1 year ago

Hot take: The optimal solution is the most obvious one for people who are more used to UNIX commands than programming languages, because the primitives in UNIX are more powerful (they give you a beautiful solution if you can make your problem fit in them; if not, the shell becomes atrocious).Here's my reasoning for the problem: we have 2 files, one for each day, and we want to see who came both days on different pages. That's a job for join, with some sort|uniq in the middle.- for each page, we want unique CustomerId -> PageId "mappings", but in UNIX land that's just 2 columns on the same row:<pre><code> cat dayX | awk '{print $3 $2}' | sort | uniq </code></pre> - now I have two lists, I join them<pre><code> join day1_uniq day2_uniq </code></pre> this gives me, for each customer, its id, then all its pages, on the same line. Customers who came only on one day are not in the output.- now I want to see if there are at least 2 pages and those 2 pages are different. There's no easy UNIX way to do this because it's all on a single line, so we'll use awk to build a hashmap. We don't need to build a map of all pages, we only need to see if there are at least 2 different pages<pre><code> cat both_days | awk '{for (i = 1; i < NF; i++) {pages[$i] = 1; if (length(pages) == 2) {print; next}} }' </code></pre> (Note: length() is not posix but gawk)Result: a list of all customers having visited at least 2 different pages on both days. Everything is dominated by the initial sort. I haven't ran this, it's a rough draft.

leephillipsover 1 year ago

The best answer to the interview question is not mentioned in the article. It’s something like “I’m not interested in working here because I don’t want to use my art to spy on people.”

hoarfover 1 year ago

Here's a better interview question:What is the problem of using later job interview stages as validation for early stages and what would be a better metric for validation.

michael_leachimover 1 year ago

that looks like I am a good candidate for Amazon, lol.but on a serious note, the good solution is sort of obvious and in general you encounter much more interesting problems on a daily basis, but than again I am working in Clojure where working with data structures is much easier and straightforward than in Java.

评论 #38263694 未加载

say_it_as_it_isover 1 year ago

What are your favorite interview questions for people who assume management roles as these?

Archelaosover 1 year ago

> I’ve actually expressed the problem in an ambiguous way.> Did I mean 2 unique pages per day or overall?The author needs a course in logic. "They visited at least two unique pages." is not ambiguous. Visiting page A on day 1 and visiting page B on day 2 makes the sentence true.

评论 #38262008 未加载

评论 #38258965 未加载

评论 #38261674 未加载

评论 #38258704 未加载

thaumasiotesover 1 year ago

> I don’t mind getting the naive solution first, but I really want to see my candidate having that aha moment that O(n²) is probably never good in any problem. And I want that aha moment to come pretty quickly and without hints. No great engineer should ever settle for an O(n²) algorithm, unless bound by memory or some other unmovable constraint.But that is purely a cultural convention. O(n²) is great for many important problems. For example, parsing a sentence in natural language is Ω(n³); getting it in O(n²) would be evidence that the candidate was a deity.Why select for familiarity with the details and conventions of the interviewing process? How is that supposed to be helpful?> Candidates then switch it around to have CustomerId as the Key, and PageId as the Value of the Map. But that’s not particularly great either because it overlooks the fact that you can have many pages per customer, not just one. Some candidates have the intuition that they need a Collection of pages as the Value of the MapBut this is wrong. You're completely fine having a map from CustomerId to a single PageId, because the problem statement specifies that you're collecting customers who have visited more than one page. If you process a record that says CustomerId = X, PageId = Y, and you look up CustomerId in your map, these are the possibilities:1. map[CustomerId] has no entry. You write Y as the new entry for that CustomerId.2. map[CustomerId] is already Y. You do nothing.3. map[CustomerId] has an entry that is not Y. You now know that CustomerId represents one of the customers you're trying to find.At no point did you need the map values to represent more than one page.> The condition “customers that visited at least 2 unique pages” tends to be a little harder for candidates to get right, so if they’re stuck I throw a little hint: you have a Set of pages from Day1, and a single page from Day2… how can you determine that this is at least two unique pages?> Poor candidates will loop through the elements in the Set to check if the page from Day2 is in there. This turns your O(n) algorithm into O(n²) again. The number of candidates who have done this is surprising.> Better candidates will do a .contains() on the Set which is an O(1) operation on a hash set. But there is a catch with the logic.> The intuition to get this right is this: If you are inside that If loop and the customer visited at least two pages in Day1, and they visited any page in Day2, they’re loyal, regardless of which page they visit in Day2. Otherwise, they only visited only one page in Day1, so the question is: is this a different page? If so they’re loyal, else it’s a duplicate so you don’t know and should keep going. So your If statement has an Or:> [code sample involving the interviewer's terrible solution]> There’s a need for attention to detail, like using “>” instead of “>=” or missing the “!” in the second statement. I saw these fairly often. I didn’t worry. Great candidates spotted them quickly as they double-checked the algorithm when they were done. Good candidates spotted them after a little bit of hinting. That gave me a good signal on debugging skills.Why in the world is this presented as a desirable solution? You have a Set of visited pages from day 1 and a single visited page from day 2. You want to know whether the total number of visited pages is more than 1.Add the page from day 2 to the Set [O(1)], and then count the Set [also O(1)].> What if you pre-processed the files and sorted them by CustomerId, then by PageId?> If the files are sorted, then the problem is easier and it’s just a two-pointer algorithm that you can execute in O(n) with O(1) of memory.> Since the second sort key is by PageId, you follow another two-pointer algorithm to determine that there are at least two unique pages. So it’s a 2-pointer algorithm within a 2-pointer algorithm. It’s kind of a fun problem! I’ll leave the actual implementation as an exercise for the viewer.> If you want to make the problem even more interesting, you can add a third file. I will leave that as an exercise for the reader as well!If you're preprocessing the files, why not concatenate them before (or while) sorting them? The asymptotic resource requirements are the same and you end up with one file that can be processed in O(n) time and O(1) space. (Though the result of the algorithm necessarily takes up O(n) space, so I'm not sure how much this should count as an improvement in terms of space requirements...)This additional preprocessing step makes the generalization to three files trivial. The algorithm is identical: concatenate the files, sort the monofile, and then walk through the sorted entries.

bvrmnover 1 year ago

Awful solutions TBH. They are quite hard to implement (requirements are coupled and could not be composed, logic heavy details) very specific and non-extendable for requirements changes in any way. You don't want this kind of code in a real life.