> There are many reasons why an organization would need to build a distributed system, but here are two examples:<p>> - The demands of a consumer Web site/API or multitenant enterprise application simply exceed the computing capacity of any one machine.<p>> - An enterprise moves an existing application, such as a three-tier system, onto a cloud service provider in order to save on hardware/data-center costs.<p>When you've exhausted the capacity of a single machine, typically you don't jump straight to a distributed system. You can scale out horizontally with a stateless application layer, as long as the data storage on the backend can handle all the load. You can also scale database reads horizontally, using read replicas.<p>This horizontal scale-out is not a distributed system, since consensus ("source of truth") still lives in one machine.<p>So I think a better phrasing of #1 would be "When your write patterns exceed the computing capacity of any one machine".
Regarding testing distributed systems. Chaos Monkey, like they mention, is awesome, and I also highly recommend getting Kyle to run Jepsen tests. But we still need more tools on this front, so we built <a href="https://github.com/gundb/panic-server" rel="nofollow">https://github.com/gundb/panic-server</a> which integrates with Mocha (and other unit test frameworks) to make it easy to run failure scenarios across real and virtual machines. It has been a life saver for me.
> Geographies. Will this system be global, or will it run in "silos" per region?<p>Although I've only worked at Amazon for about a year, I've learned that you should always consider building siloed/regionalized applications—if not, expect major headaches when the service needs to be deployed in multiple environments.
This is a verbose paper written by a tech bro that over-generalizes "How To Build A Scaling High Availability Web App", but fails to explain how such systems are designed in general. If you're a tech startup and have never worked in this industry before, the very last paragraph is useful.<p>My personal ideal system design is one that you can pick up and drop into a single machine, a LAN, a cloud network, geographically dispersed colocated datacenters, etc without relying on 3rd party service providers. If you go from a start-up to a billion dollar company, you will eventually have offices with their own labs, dev, qa, middleware and ops teams, datacenters and production facilities, and your hardware and software service providers will run the gamut. If you can abstract the individual components of your system so that dependencies can be replaced live without any changes to any other part of the system, you have the start of a decent design.<p>However, nobody I've ever worked for designed their system this way initially, and they made millions to billions of dollars, so there certainly is no requirement that you have a perfect distributed system design for your emoji app start-up.