> Google asked me two interview problems involving these back when I was in college. Have I ever needed this in practice? Nope!<p>This is the shame of the Big Tech. I've used bloomfilters, tries, and many other of these multiple times on side projects with constraints, but would something like this ever make it into a deployed system at FAANG by all the regular devs?<p>No, just use dynamo, spanner or a bigger cluster + more workers, more queues, add more k8s nodes, plus whatever cloud tech you should be using to "horizontally scale" and be "distributed".<p>Not that there aren't awesome devs at these companies which do code responsibly, but I don't often see it.<p>I can't believe how unperformant the systems I've often seen in Big Tech are. I want more people to read the article about fitting the whole level into a single byte[1] and stop using big budgets as a way to be irresponsible.<p>1: <a href="https://news.ycombinator.com/item?id=34095954" rel="nofollow">https://news.ycombinator.com/item?id=34095954</a>
Although it is somewhat embarrassing to admit this, when I wrote JACK [0] I was unaware of topological sort, mostly due to a lack of any formal CS education. This was a major error/failing on my part, because that codebase was more or less made for TS. There have been few other instances where a lack of formal CS training has been an impediment, but not knowing that there was a well-known algorithm for ordering node execution in a directed graph was ... bad.<p>[0] <a href="https://jackaudio.org/" rel="nofollow">https://jackaudio.org/</a>
Wikipedia is sometimes enough to implement an algorithm. I implemented multiversion concurrency control and btrees due to Wikipedia which do not have pseudocode. If you can implement an algorithm from an English description that is really useful. It relies on the algorithm being well explained.<p>I also suggest trying to implement a btree, they're really useful data structures for data retrieval and many database products use them extensively for indexes.<p>With any algorithm, I think there is a critical insight (that might be different for each person) where the algorithm clicks.<p>I didn't understand mergesort properly until I worked on implementing it but I understood quicksort. When the underlying mental model of the problem being solved becomes clear, we can reason about the algorithm and write code to implement what we want to do.
At this point, everyone knows about bloom filters simply because it’s <i>the</i> thing to write about any time someone asks “lesser known data structures/algorithms?” - it’s hard to find a list that <i>doesn’t</i> feature them!<p>Still some good entries in that list. I would add skip lists, not because they’re structurally anything special but because they open your mind to the possibly entirely-new-to-the-reader world of probabilistic data structures.<p>As someone that’s been around the block, something I’ve learned to be less surprised by is how often simply adding an element of randomness to a difficult problem can sometimes give you results comparable to a 10- or 100-times more complicated solution, with the advantage of being more maintainable, making better use of the i$, and avoiding a lot of work. (Eg compare a cache dropping randomly selected items to LFU or LRU caches.)
Fun list! Other things I'd recommend trying:<p><pre><code> Huffman coding
Binary decision diagrams
Simplex method
General modeling of a problem to SAT
</code></pre>
The last one there is particularly fun. Finding a way to solve a problem fast by mapping it to a silly large set of variables is surprisingly fun.
I often find myself thinking: "But how would I do any of these without reaching for mutation? So that I can later parallelize them?" And then I come up empty. I still need to work through "Purely Functional Data Structures" by Okasaki. Hopefully then I will have an idea, of how to come up with a functional version to mostly single-core algorithms or at least locking-requiring algorithms.<p>I skipped ahead once and looked at functional red-black trees and the solution was to involve one more color! How ingenious is that?! I have no idea, how people got to these solutions.<p>In many places algorithms are described implicitly with mutation assumed. However, this can be very hard to parallelize later and might hit you in the future. In the times of many cores, this seems no longer appropriate to me. I wish we spent more time on learning how to get it done without mutation.<p>Another great post that was linked at some point on HN was this: <a href="https://nullprogram.com/blog/2020/04/30/" rel="nofollow">https://nullprogram.com/blog/2020/04/30/</a> -- A comparatively simple change in approach and yet it allows for massive parallelization without locks! This is the kind of stuff we should learn more about and then make more use of our many available cores, moving on from the single core world of algorithms.
What I would be interested in is some kind of resource (book, video series, etc.) where someone presents a reasonably "real-world" problem, discusses the thought process in choosing an implementation, and steps through the process of implementing, testing, and iterating on the solution. It's easy to find sources for hard problems to solve, and it's easy to find sources for explanations of algorithms and data structures, but I haven't found much to bridge the gap.
This is related to one of my fun stories.
During an interview to one company, I got asked topological sort question. I passed, but in couple of months I needed to troubleshoot why host app has a problem loading addons. I started digging and found that their implementation of topological sort was wrong…
I recommend <a href="https://web.stanford.edu/class/cs168/index.html" rel="nofollow">https://web.stanford.edu/class/cs168/index.html</a>
This is an interesting list. I'd add hashing and hash tables to the list. There are lots of interesting variations of these as well.<p>I would put hash tables ahead of piece tables because of their generality. For text editors I've always thought that gap buffers were so straightforward that ropes and piece tables seemed like extra unnecessary work. I see that VS Code uses piece tables so perhaps someone can explain the performance differences to me.<p>I've never sat down and benchmarked various implementations of text buffers. However, it seems to me that memmove(3) on modern processors would be so fast that moving a buffer gap on even a 1GB file wouldn't be perceptible, and in return operations over the whole buffer like string searching or parsing would be much more efficient in a gap buffer because the program could count on the buffer being in contiguous memory locations (after closing the gap). In buffers implemented using a part table or ropes, every character access involves extra layers of indirection. Furthermore, gap buffers are going to be more cache friendly for the way that most editing operations take place in a text editor.<p>Perhaps someone that has actually done some comparisons could enlighten me.
I like algorithms and data structures. One of the thing I, personally, find hard about Algos and DS, is that I don't know when to use which. For example, I could never ever come up with my own, that the traveling salesman problem can have a good solution with ant colony optimization or simulated annealing. How do I learn that skill? I still don't know when to use a graph for what kind of problem.<p>I hope someone can shed some light on this.
Nicely written. Hits a spot that a lot of instructional and documentary material misses - telling the reader up front what common problem each technique solves.
In my most recent job, my first real task was to implement a persistent (i.e. flash-disk-based) queue. In (micro)python. Seems simple enough, until real world performance constrains kick in, and suddenly things get much, much more interesting. Need to keep flash happy, so wear-levelling is a concern; not a problem using LittleFS (as it's designed for flash disks), but that ends up taking massive amounts of time for seeking. And we have an upper bound on allowable time for an operation, which throws out any naive solution.<p>Implementing algorithms is easy. Implementing algorithms with constraints... not so much (but still fun though)
Surprised HyperLogLog didn't get mentioned considering Bloom filters were such a popular suggestion.<p><a href="https://en.wikipedia.org/wiki/HyperLogLog" rel="nofollow">https://en.wikipedia.org/wiki/HyperLogLog</a>
I'd put Fourier transforms on the list of algorithms that are extremely useful to know. (Well, not exactly the algorithm itself as much as the concept.) Many things make a lot of sense if you understand Fourier transforms, such as computer graphics, audio processing, signal processing, image processing, data analysis, filtering, and electronics. But Fourier transforms get hardly any attention on HN.
The OP's list is decent. Here are my suggestions for moderately challenging algorithms to understand and implement:<p><pre><code> B-tree
Cooley-Tukey radix-2 FFT
CRC-32
DEFLATE decoder
Gauss-Jordan elimination
JSON parser
SHA-1 hash</code></pre>
> Segment tree<p>Note that there are at least two data structures with the same name (three if you also count interval tree): the one used in competitive programming for fast queries on subsegments of a changing array, and two used for storing segments/intervals on a line in slightly different ways. Wikipedia describes the latter.
I don't have a problem with the article. I do have a problem with the "every programmer". Not every programmer writes sophisticated algorithms. In fact, in my experience working on web sites most programmers are front end developers coding web sites or back end developers shuffling data around. The most sophisticated algorithm they need is a for loop. Occasionally one needs and algorithm developer to work on scale out.<p>Next time try, "challenging algorithms and data structures every algorithm programmer should try."
Call me cynical, but every time I see interesting threads like these - I can't help but to think that somewhere, a hiring manager is thinking to himself <i>"Nice, I'll add these to the technical (interview) questions"</i>