Learn how to design large-scale systems

1273 pointsby donnemartinalmost 7 years ago

23 comments

risalmost 7 years ago

I'm quite tired of everyone wanting to build "large scale systems" and play at being Netflix. The truth of the matter is the vast vast majority of people will never need to do this with their project and instead will just end up making an expensive to maintain mess with way too many moving parts.At least as important as designing something that can scale up is designing something that can scale down. You don't know when the organization's going to need to deprioritize this project and be able to keep it running without burning a couple of million in resources every year.See: microservices. (as in, for the problem, not the solution)

评论 #17528527 未加载

评论 #17528385 未加载

评论 #17528657 未加载

评论 #17528164 未加载

评论 #17535328 未加载

评论 #17532538 未加载

评论 #17527891 未加载

评论 #17534783 未加载

评论 #17528387 未加载

评论 #17528843 未加载

评论 #17533407 未加载

评论 #17527780 未加载

评论 #17532960 未加载

cjhanksalmost 7 years ago

I see something comparable to these diagrams (it feels like) a half-dozen times a year.The architecture is in general 'fine'. But communication paths of subsystems is probably the easiest part of the problem. And in general, re-organizing the architecture of a system is usually possible - if and only if - the underlying data model is sane.The more important questions are;- What is the convention for addressing assets and entities? Is it consistent and useful for informing both security or data routing?- What is the security policy for any specific entity in your system? How can it be modified? How long does it take to propagate that change? How centralized is the authentication?- How can information created from failed events be properly garbage collected?- How can you independently audit consistency between all independent subsystems?- If a piece of "data" is found, how complex is it to find the origin of this data?- What is the policy/system for enforcing subsystems have a very narrow capability to mutate information?If you get these questions answered correctly (amongst others not on the tip of my tongue), you can grow your architecture from a monolith to anything you want.

评论 #17526870 未加载

评论 #17526739 未加载

madethemcryalmost 7 years ago

Oh interesting, I have never seen Anki (<a href="https://apps.ankiweb.net/" rel="nofollow">https://apps.ankiweb.net/</a>) being used for large blocks of source code.Anki is an open source application (desktop + mobile) for spaced repetition learning (aka flashcards). It's a very popular tool among people who want to learn languages (and basically anything else you want to remember). There are many shared decks (<a href="https://ankiweb.net/shared/decks/" rel="nofollow">https://ankiweb.net/shared/decks/</a>). Creating and formatting cards is also possible and pretty easy.If you are planning to learn a language or anything else give Anki a try. I used it for all of my language learning efforts. With this least my vocabulary is rocking solid.

评论 #17524554 未加载

评论 #17524802 未加载

评论 #17523610 未加载

评论 #17527726 未加载

bandwitchalmost 7 years ago

If you liked this page, you might also like the excellent book "Designing Data-Intensive Applications" that among others surveys many characteristics of large-scale systems and presents some. Note that it's not a book for preparing you on system design questions, but it can definitely help.

评论 #17525503 未加载

评论 #17525274 未加载

评论 #17525569 未加载

agentultraalmost 7 years ago

I'd add a section on using TLA+ as a design tool. Diagrams and rules of thumb are useful but they don't catch errors or help you discover the correct architecture. See the Amazon paper [0] on their use of TLA+ in designing (and trouble-shooting) services.[0] <a href="https://lamport.azurewebsites.net/tla/formal-methods-amazon.pdf" rel="nofollow">https://lamport.azurewebsites.net/tla/formal-methods-amazon....</a>

评论 #17523326 未加载

评论 #17523744 未加载

评论 #17523408 未加载

Zeebrommeralmost 7 years ago

Can we please come up with a more specific name for this type of expertise? A large-scale system can mean anything from a social security system to a rocket. I was a bit disappointed that it only concerns websites here (though I'm aware that I'm browsing HN).

评论 #17526041 未加载

mabboalmost 7 years ago

This design, roughly, is being used very widely and is well-documented everywhere. But does anyone know of any lesser-known yet equally functional designs that work at the same scale?Are there cases this design does not work for?

评论 #17524912 未加载

评论 #17524351 未加载

e12ealmost 7 years ago

Interesting how the write api doesn't appear to invalidate/update the memory cache in the first diagram.Still recommend people read Fielding's REST thesis - as it demonstrates a lot of possible architectures (eg fat client or what we today call SPAs) - not simply REST. Along with some trade-offs. (REST is mainly motivated by simplicity of a simple hypertext application coupled with easy multi-level caching).<a href="https://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm" rel="nofollow">https://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm</a>For a preview of SPAs before the prevalence of Javascript, see 3.5, in particular 3.5.3 "code on demand":<a href="https://www.ics.uci.edu/~fielding/pubs/dissertation/net_arch_styles.htm#sec_3_5" rel="nofollow">https://www.ics.uci.edu/~fielding/pubs/dissertation/net_arch...</a>And keep in mind the text is from 2000. Early Ajax was introduced in IE in 1999, and late 2000 in Mozilla - but it took a while for Ajax to become standardized...

yreadalmost 7 years ago

I hoped this would help me with this problem I have - I'm coding a web app with a smallish database (<1GB for the next few years, <1% writes). I need low latencies for accessing it. And I would like to have multiple servers over the world sharing the database.

评论 #17524979 未加载

bovermyeralmost 7 years ago

You know what hasn't been done? A blog post about how to make a service that fulfills the needs of most people most of the time.All of the online and print material about such things focus on how to achieve massive scale correctly. Don't get me wrong; this is valuable and, generally, sound advice.However, it also ignores the majority of use cases for software.I would love to see a blog post here from someone who has solved a very specific problem for a very small audience, and gotten a very enthusiastic response. That would be meaningful on a larger scale for me.

评论 #17528191 未加载

NightlyDevalmost 7 years ago

I find it fun to thinker with high performance and high scalability designs, but I, as most others, have no need for it.Start out small, make efficient systems and have scalability in the back of your head when doing so. Don't do as so many others: "Oh, this lib seems popular, let's just use that! Heck, the cart sometimes takes 8 minutes to load, we need to add more nodes on AWS!"Yeah, stuff like that happens.At least in my book optimization usually beats scalability as the place to start for more performance.

stvnwalmost 7 years ago

Is there something similar to designing scalable front-end systems and going into deep discussions about how certain companies resolve similar issues at scale? I'd be interested if there is a resource like that out there. Everything out there tailored to systems design and architecture are entrenched in backend components.

robaxalmost 7 years ago

As a junior dev who one day wants to be in a senior position, this is super helpful. I failed the system design portion of the triplebyte interview and this would have been invaluable to me. Thank you!

nwsmalmost 7 years ago

This is a nice followup to the web architecture post yesterday

评论 #17523120 未加载

d--almost 7 years ago

I'm teaching an intro distributed systems class and would like to share this with my students. I was wondering about how general the linked interview prepwork is. Are the Anki cards and sample interview questions mostly from large companies (FB, Google, MS) or also applicable to interviewing at smaller places?At first look, seems like these are fairly general questions, which is great.

geggamalmost 7 years ago

No database access layer ?

评论 #17523060 未加载

squeglesalmost 7 years ago

This is a great outline for studying before interviews. I recently studied off of this and can say it contributed to my success in SRE/Infra interviews. Highly recommended!

ex_amazon_sdealmost 7 years ago

Most of this stuff would not pass a design review at Amazon.Anything that requires a fleet of (relational) databases to ensure consistency will not work on a global scale.

tanilamaalmost 7 years ago

Large-Scale in what sense? A web service runs many instances isn't really instantly indicating its complexity.

pier25almost 7 years ago

Anyone knows what software is being used to draw the diagrams?

评论 #17526184 未加载

visvivaalmost 7 years ago

*Software systems

mozumderalmost 7 years ago

What kind of numbers are they talking about for it to be "large-scale"?One well designed fast app server can serve 1000 requests per second per processor core, and you might have 50 processor cores in a 2U rack, for 50,000 requests per second. For database access, you now have fast NVMe disks that can push 2 million IOPS to serve those 50,000 accesses.50,000 requests per second is good enough for a million concurrent users, maybe 10-50 million users per day.If you have 50 million users per day, then you're already among the largest websites in the world. Do you really need this sort of architecture for your startup system?If anything, you'd probably need a more distributed system that reduces network latencies around the world, instead of a single scale-out system.

评论 #17528087 未加载

评论 #17524299 未加载

sillysaurus3almost 7 years ago

Note that HN, a top-1000 site in the US, runs on a single box via a single racket process."The key to performance is elegance, not battalions of special cases."

评论 #17524507 未加载

评论 #17523310 未加载

评论 #17523714 未加载

评论 #17524333 未加载

评论 #17526970 未加载

评论 #17524094 未加载

评论 #17523771 未加载

评论 #17523444 未加载

评论 #17524403 未加载

评论 #17524805 未加载

评论 #17528596 未加载

评论 #17523798 未加载

评论 #17524359 未加载

评论 #17523902 未加载