High level observations:<p>1. Business level constraints (time, human, fiscal and other resources, stakeholders) trump technical constraints <i>every time</i>. Identifying these should be <i>step zero</i> in any design process.<p>2. A business-level risk model assists with appropriate design with respect to both security and availability and should ultimately drive component selection.<p>3. Content seems very much focused on public IP services provided through multiple networked subsystems. While this is a very popular category of modern systems design, not all systems fall in to this category (eg. embedded), and even if they do many complex systems are internal, and public-facing interfaces are partly shielded/outsourced (Cloudflare, AWS, etc.).<p>4. Existing depth in areas such as database replication could perhaps be grouped in a generic fashion as examples of fault tolerance and failure / issue-mitigation strategies.<p>5. Asynchronicity and communication could be grouped together under architectural paradigms (eg. state, consistency and recovery models), since they tend to define at least subsystem-local architectural paradigms. (Ever tried restoring a huge RDBMS backup or performing a backup between major RDBMS versions where downtime is a concern? What about debugging interactions between numerous message queues, or disparate views of a shared database (eg. blockchain, split-capable orchestration systems) with supposed eventual consistency?)<p>6. Legal and regulatory considerations are often very powerful architectural concerns. In a multinational system with components owned by disparate legal entities in different jurisdictions, potential regulatory ingress (eg. halt/seize/shut down national operations) can become a significant consideration.<p>7. The new/greenfield systems design perspective is a valid and common one. However, equally commonly, established organizations' subsystems are (re-)designed/upgraded, and in this case system interfaces may be internal or otherwise highly distinct from public service design. Often these sorts of projects are harder because of downtime concerns, migration complexity and organizational/technical inertia.
I very much want to hear the words "failure isolation" during a systems design interview. Usually as the answer to "Why did you break that functionality out into a separate service?". The answer should involve "independent scaling" and "failure isolation".
Does "system" here mean "system of internet services"? I'm designing large systems and hope to learn more - but none of my systems have servers. Anywhere.
I recently gave interviews and did my preparation from <a href="https://www.educative.io/collection/5668639101419520/5649050225344512" rel="nofollow">https://www.educative.io/collection/5668639101419520/5649050...</a>. It was pretty useful.
Haven't read the entire guide yet. But, I hope it has a few lines somewhere about over-engineering a solution. Yes, fault-tolerance, Asynchronousism, individual scalability are virtues you want, but not for a super simple problem that needs functional work. I've been in so many discussions with people that talk about all these virtues and speend too little a time on making that core function do what it is supposed to do.
Brilliant work. I may convert this into MkDocs "formatted" project using the Materials theme. I've done the same thing for the Open Guide to AWS which I'm still working on. It vastly improves the readability and accessible of the information.
That's amazing. Thank you for creating this. It's very useful for preparing an interview in system design.<p>I think you miss "Show HN" on your post.
This looks like a great guide, thanks! Makes me wonder how effective things like Google's app engine are in autoscaling your web apps. "Serverless" code seems too good to be true.
Nice concept!<p>A missing area is identity management. Most likely this should be separated from your system (e.g. don't have a table somewhere with username, password in it).<p>In consumer facing systems, OpenID Connect (better) as practiced by Google, OAuth is used by most others.<p>In enterprise software, SAML is the common parlance.<p>That leads naturally to questions about API authorization (are API calls made on behalf of system users? If not, start probing further).<p>Always enlightening to start asking questions about identity management very early on in designing systems.