I'm quite tired of everyone wanting to build "large scale systems" and play at being Netflix. The truth of the matter is the vast vast majority of people will never need to do this with their project and instead will just end up making an expensive to maintain mess with way too many moving parts.<p>At least as important as designing something that can scale up is designing something that can scale <i>down</i>. You don't know when the organization's going to need to deprioritize this project and be able to keep it running without burning a couple of million in resources every year.<p>See: microservices. (as in, for the problem, not the solution)
I see something comparable to these diagrams (it feels like) a half-dozen times a year.<p>The architecture is in general 'fine'. But communication paths of subsystems is probably the easiest part of the problem. And in general, re-organizing the architecture of a system is usually possible - if and only if - the underlying data model is sane.<p>The more important questions are;<p>- What is the convention for addressing assets and entities? Is it consistent and useful for informing both security or data routing?<p>- What is the security policy for any specific entity in your system? How can it be modified? How long does it take to propagate that change? How centralized is the authentication?<p>- How can information created from failed events be properly garbage collected?<p>- How can you independently audit consistency between all independent subsystems?<p>- If a piece of "data" is found, how complex is it to find the origin of this data?<p>- What is the policy/system for enforcing subsystems have a very narrow capability to mutate information?<p>If you get these questions answered correctly (amongst others not on the tip of my tongue), you can grow your architecture from a monolith to anything you want.
Oh interesting, I have never seen Anki (<a href="https://apps.ankiweb.net/" rel="nofollow">https://apps.ankiweb.net/</a>) being used for large blocks of source code.<p>Anki is an open source application (desktop + mobile) for spaced repetition learning (aka flashcards). It's a very popular tool among people who want to learn languages (and basically anything else you want to remember). There are many shared decks (<a href="https://ankiweb.net/shared/decks/" rel="nofollow">https://ankiweb.net/shared/decks/</a>). Creating and formatting cards is also possible and pretty easy.<p>If you are planning to learn a language or anything else give Anki a try. I used it for all of my language learning efforts. With this least my vocabulary is rocking solid.
If you liked this page, you might also like the excellent book "Designing Data-Intensive Applications" that among others surveys many characteristics of large-scale systems and presents some. Note that it's not a book for preparing you on system design questions, but it can definitely help.
I'd add a section on using TLA+ as a design tool. Diagrams and rules of thumb are useful but they don't catch errors or help you discover the correct architecture. See the Amazon paper [0] on their use of TLA+ in designing (and trouble-shooting) services.<p>[0] <a href="https://lamport.azurewebsites.net/tla/formal-methods-amazon.pdf" rel="nofollow">https://lamport.azurewebsites.net/tla/formal-methods-amazon....</a>
Can we please come up with a more specific name for this type of expertise? A large-scale system can mean anything from a social security system to a rocket. I was a bit disappointed that it only concerns websites here (though I'm aware that I'm browsing HN).
This design, roughly, is being used very widely and is well-documented everywhere. But does anyone know of any lesser-known yet equally functional designs that work at the same scale?<p>Are there cases this design does <i>not</i> work for?
Interesting how the write api doesn't appear to invalidate/update the memory cache in the first diagram.<p>Still recommend people read Fielding's REST thesis - as it demonstrates a lot of possible architectures (eg fat client or what we today call SPAs) - not simply REST. Along with some trade-offs. (REST is mainly motivated by simplicity of a simple hypertext application coupled with easy multi-level caching).<p><a href="https://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm" rel="nofollow">https://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm</a><p>For a preview of SPAs before the prevalence of Javascript, see 3.5, in particular 3.5.3 "code on demand":<p><a href="https://www.ics.uci.edu/~fielding/pubs/dissertation/net_arch_styles.htm#sec_3_5" rel="nofollow">https://www.ics.uci.edu/~fielding/pubs/dissertation/net_arch...</a><p>And keep in mind the text is from 2000. Early Ajax was introduced in IE in 1999, and late 2000 in Mozilla - but it took a while for Ajax to become standardized...
I hoped this would help me with this problem I have - I'm coding a web app with a smallish database (<1GB for the next few years, <1% writes). I need low latencies for accessing it. And I would like to have multiple servers over the world sharing the database.
You know what hasn't been done? A blog post about how to make a service that fulfills the needs of most people most of the time.<p>All of the online and print material about such things focus on how to achieve massive scale correctly. Don't get me wrong; this is valuable and, generally, sound advice.<p>However, it also ignores the majority of use cases for software.<p>I would love to see a blog post here from someone who has solved a very specific problem for a very small audience, and gotten a very enthusiastic response. That would be meaningful on a larger scale for me.
I find it fun to thinker with high performance and high scalability designs, but I, as most others, have no need for it.<p>Start out small, make efficient systems and have scalability in the back of your head when doing so. Don't do as so many others: "Oh, this lib seems popular, let's just use that! Heck, the cart sometimes takes 8 minutes to load, we need to add more nodes on AWS!"<p>Yeah, stuff like that happens.<p>At least in my book optimization usually beats scalability as the place to start for more performance.
Is there something similar to designing scalable front-end systems and going into deep discussions about how certain companies resolve similar issues at scale? I'd be interested if there is a resource like that out there. Everything out there tailored to systems design and architecture are entrenched in backend components.
As a junior dev who one day wants to be in a senior position, this is super helpful. I failed the system design portion of the triplebyte interview and this would have been invaluable to me. Thank you!
I'm teaching an intro distributed systems class and would like to share this with my students. I was wondering about how general the linked interview prepwork is. Are the Anki cards and sample interview questions mostly from large companies (FB, Google, MS) or also applicable to interviewing at smaller places?<p>At first look, seems like these are fairly general questions, which is great.
This is a great outline for studying before interviews. I recently studied off of this and can say it contributed to my success in SRE/Infra interviews. Highly recommended!
Most of this stuff would not pass a design review at Amazon.<p>Anything that requires a fleet of (relational) databases to ensure consistency will not work on a global scale.
What kind of numbers are they talking about for it to be "large-scale"?<p>One well designed fast app server can serve 1000 requests per second per processor core, and you might have 50 processor cores in a 2U rack, for 50,000 requests per second. For database access, you now have fast NVMe disks that can push 2 million IOPS to serve those 50,000 accesses.<p>50,000 requests per second is good enough for a million concurrent users, maybe 10-50 million users per day.<p>If you have 50 million users per day, then you're already among the largest websites in the world. Do you really need this sort of architecture for your startup system?<p>If anything, you'd probably need a more distributed system that reduces network latencies around the world, instead of a single scale-out system.
Note that HN, a top-1000 site in the US, runs on a single box via a single racket process.<p>"The key to performance is elegance, not battalions of special cases."