Challenging algorithms and data structures every programmer should try

439 pointsby whackover 2 years ago

25 comments

Xeoncrossover 2 years ago

> Google asked me two interview problems involving these back when I was in college. Have I ever needed this in practice? Nope!This is the shame of the Big Tech. I've used bloomfilters, tries, and many other of these multiple times on side projects with constraints, but would something like this ever make it into a deployed system at FAANG by all the regular devs?No, just use dynamo, spanner or a bigger cluster + more workers, more queues, add more k8s nodes, plus whatever cloud tech you should be using to "horizontally scale" and be "distributed".Not that there aren't awesome devs at these companies which do code responsibly, but I don't often see it.I can't believe how unperformant the systems I've often seen in Big Tech are. I want more people to read the article about fitting the whole level into a single byte[1] and stop using big budgets as a way to be irresponsible.1: <a href="https://news.ycombinator.com/item?id=34095954" rel="nofollow">https://news.ycombinator.com/item?id=34095954</a>

评论 #34121525 未加载

评论 #34121233 未加载

评论 #34121006 未加载

评论 #34121014 未加载

评论 #34122767 未加载

评论 #34121043 未加载

评论 #34122882 未加载

评论 #34138728 未加载

评论 #34122276 未加载

PaulDavisThe1stover 2 years ago

Although it is somewhat embarrassing to admit this, when I wrote JACK [0] I was unaware of topological sort, mostly due to a lack of any formal CS education. This was a major error/failing on my part, because that codebase was more or less made for TS. There have been few other instances where a lack of formal CS training has been an impediment, but not knowing that there was a well-known algorithm for ordering node execution in a directed graph was ... bad.[0] <a href="https://jackaudio.org/" rel="nofollow">https://jackaudio.org/</a>

samsquireover 2 years ago

Wikipedia is sometimes enough to implement an algorithm. I implemented multiversion concurrency control and btrees due to Wikipedia which do not have pseudocode. If you can implement an algorithm from an English description that is really useful. It relies on the algorithm being well explained.I also suggest trying to implement a btree, they're really useful data structures for data retrieval and many database products use them extensively for indexes.With any algorithm, I think there is a critical insight (that might be different for each person) where the algorithm clicks.I didn't understand mergesort properly until I worked on implementing it but I understood quicksort. When the underlying mental model of the problem being solved becomes clear, we can reason about the algorithm and write code to implement what we want to do.

评论 #34125414 未加载

ComputerGuruover 2 years ago

At this point, everyone knows about bloom filters simply because it’s the thing to write about any time someone asks “lesser known data structures/algorithms?” - it’s hard to find a list that doesn’t feature them!Still some good entries in that list. I would add skip lists, not because they’re structurally anything special but because they open your mind to the possibly entirely-new-to-the-reader world of probabilistic data structures.As someone that’s been around the block, something I’ve learned to be less surprised by is how often simply adding an element of randomness to a difficult problem can sometimes give you results comparable to a 10- or 100-times more complicated solution, with the advantage of being more maintainable, making better use of the i$, and avoiding a lot of work. (Eg compare a cache dropping randomly selected items to LFU or LRU caches.)

评论 #34122058 未加载

评论 #34125256 未加载

评论 #34123589 未加载

评论 #34121131 未加载

taericover 2 years ago

Fun list! Other things I'd recommend trying:<pre><code> Huffman coding Binary decision diagrams Simplex method General modeling of a problem to SAT </code></pre> The last one there is particularly fun. Finding a way to solve a problem fast by mapping it to a silly large set of variables is surprisingly fun.

评论 #34123594 未加载

zelphirkaltover 2 years ago

I often find myself thinking: "But how would I do any of these without reaching for mutation? So that I can later parallelize them?" And then I come up empty. I still need to work through "Purely Functional Data Structures" by Okasaki. Hopefully then I will have an idea, of how to come up with a functional version to mostly single-core algorithms or at least locking-requiring algorithms.I skipped ahead once and looked at functional red-black trees and the solution was to involve one more color! How ingenious is that?! I have no idea, how people got to these solutions.In many places algorithms are described implicitly with mutation assumed. However, this can be very hard to parallelize later and might hit you in the future. In the times of many cores, this seems no longer appropriate to me. I wish we spent more time on learning how to get it done without mutation.Another great post that was linked at some point on HN was this: <a href="https://nullprogram.com/blog/2020/04/30/" rel="nofollow">https://nullprogram.com/blog/2020/04/30/</a> -- A comparatively simple change in approach and yet it allows for massive parallelization without locks! This is the kind of stuff we should learn more about and then make more use of our many available cores, moving on from the single core world of algorithms.

评论 #34122595 未加载

评论 #34121268 未加载

评论 #34121285 未加载

评论 #34121251 未加载

djurover 2 years ago

What I would be interested in is some kind of resource (book, video series, etc.) where someone presents a reasonably "real-world" problem, discusses the thought process in choosing an implementation, and steps through the process of implementing, testing, and iterating on the solution. It's easy to find sources for hard problems to solve, and it's easy to find sources for explanations of algorithms and data structures, but I haven't found much to bridge the gap.

评论 #34125363 未加载

评论 #34122900 未加载

galkkover 2 years ago

This is related to one of my fun stories. During an interview to one company, I got asked topological sort question. I passed, but in couple of months I needed to troubleshoot why host app has a problem loading addons. I started digging and found that their implementation of topological sort was wrong…

评论 #34122530 未加载

melonyover 2 years ago

I recommend <a href="https://web.stanford.edu/class/cs168/index.html" rel="nofollow">https://web.stanford.edu/class/cs168/index.html</a>

todd8over 2 years ago

This is an interesting list. I'd add hashing and hash tables to the list. There are lots of interesting variations of these as well.I would put hash tables ahead of piece tables because of their generality. For text editors I've always thought that gap buffers were so straightforward that ropes and piece tables seemed like extra unnecessary work. I see that VS Code uses piece tables so perhaps someone can explain the performance differences to me.I've never sat down and benchmarked various implementations of text buffers. However, it seems to me that memmove(3) on modern processors would be so fast that moving a buffer gap on even a 1GB file wouldn't be perceptible, and in return operations over the whole buffer like string searching or parsing would be much more efficient in a gap buffer because the program could count on the buffer being in contiguous memory locations (after closing the gap). In buffers implemented using a part table or ropes, every character access involves extra layers of indirection. Furthermore, gap buffers are going to be more cache friendly for the way that most editing operations take place in a text editor.Perhaps someone that has actually done some comparisons could enlighten me.

评论 #34122479 未加载

评论 #34130771 未加载

rrgokover 2 years ago

I like algorithms and data structures. One of the thing I, personally, find hard about Algos and DS, is that I don't know when to use which. For example, I could never ever come up with my own, that the traveling salesman problem can have a good solution with ant colony optimization or simulated annealing. How do I learn that skill? I still don't know when to use a graph for what kind of problem.I hope someone can shed some light on this.

评论 #34121440 未加载

评论 #34121371 未加载

评论 #34121484 未加载

anigbrowlover 2 years ago

Nicely written. Hits a spot that a lot of instructional and documentary material misses - telling the reader up front what common problem each technique solves.

askvictorover 2 years ago

In my most recent job, my first real task was to implement a persistent (i.e. flash-disk-based) queue. In (micro)python. Seems simple enough, until real world performance constrains kick in, and suddenly things get much, much more interesting. Need to keep flash happy, so wear-levelling is a concern; not a problem using LittleFS (as it's designed for flash disks), but that ends up taking massive amounts of time for seeking. And we have an upper bound on allowable time for an operation, which throws out any naive solution.Implementing algorithms is easy. Implementing algorithms with constraints... not so much (but still fun though)

nikisweetingover 2 years ago

Surprised HyperLogLog didn't get mentioned considering Bloom filters were such a popular suggestion.<a href="https://en.wikipedia.org/wiki/HyperLogLog" rel="nofollow">https://en.wikipedia.org/wiki/HyperLogLog</a>

kensover 2 years ago

I'd put Fourier transforms on the list of algorithms that are extremely useful to know. (Well, not exactly the algorithm itself as much as the concept.) Many things make a lot of sense if you understand Fourier transforms, such as computer graphics, audio processing, signal processing, image processing, data analysis, filtering, and electronics. But Fourier transforms get hardly any attention on HN.

评论 #34123018 未加载

nayukiover 2 years ago

The OP's list is decent. Here are my suggestions for moderately challenging algorithms to understand and implement:<pre><code> B-tree Cooley-Tukey radix-2 FFT CRC-32 DEFLATE decoder Gauss-Jordan elimination JSON parser SHA-1 hash</code></pre>

koinedadover 2 years ago

Thanks for the list. First time I coded a Trie I thought it was super cool. Topological sort is such a useful one as well.

评论 #34120879 未加载

yeputonsover 2 years ago

> Segment treeNote that there are at least two data structures with the same name (three if you also count interval tree): the one used in competitive programming for fast queries on subsegments of a changing array, and two used for storing segments/intervals on a line in slightly different ways. Wikipedia describes the latter.

charlieyu1over 2 years ago

Thank you. Never really figure out how parser works but maybe I should try and write one

评论 #34125521 未加载

m0rissetteover 2 years ago

> Alright, so we are all spending our leisure time reading about algorithms, right?Obviously, right?

评论 #34122446 未加载

TheEndIsNighover 2 years ago

Not exactly challenging, they're covered in a decent undergrad Algorithms course.

评论 #34127076 未加载

EnderShadow8over 2 years ago

Segment trees are the foundation upon which all of competitive programming is built.

mybridover 2 years ago

I don't have a problem with the article. I do have a problem with the "every programmer". Not every programmer writes sophisticated algorithms. In fact, in my experience working on web sites most programmers are front end developers coding web sites or back end developers shuffling data around. The most sophisticated algorithm they need is a for loop. Occasionally one needs and algorithm developer to work on scale out.Next time try, "challenging algorithms and data structures every algorithm programmer should try."

TrackerFFover 2 years ago

Call me cynical, but every time I see interesting threads like these - I can't help but to think that somewhere, a hiring manager is thinking to himself "Nice, I'll add these to the technical (interview) questions"

waynecochranover 2 years ago

I would add KD-Tree and BSP-Tree as generalizations of a binary search tree in higher dimensions.