This is my paper (along with Tao and Fan). It's a great feeling to have this published and available, and I'm super proud of the team behind Physalia.<p>There's a lighter-weight introduction to the work here: <a href="https://www.amazon.science/blog/amazon-ebs-addresses-the-challenge-of-the-cap-theorem-at-scale" rel="nofollow">https://www.amazon.science/blog/amazon-ebs-addresses-the-cha...</a> and for those attending NSDI, I'll be talking about Physalia in the "Deployment Experience" session on Wednesday.
Thought for sure this'd be a thinkpiece on Excel in the enterprise ...<p>Seriously, though, this whole paper uses an amazing amount of terminology - blast radius, colony, color, game day, split brain - and an awesome biological metaphor of the Portuguese man o'war.<p>Great read even if you don't care about fault tolerance, CAP theorem, or distributed balancing at AWS-scale.<p>One sample quote of the value of cheap heuristics over full-blown number-crunching:<p>> Globally optimizing the placement of Physalia
volumes is not feasible for two reasons, one is that it’s a
non-convex optimization problem across huge numbers of
variables, the other is that it needs to be done online because
volumes and cells come and go at a high rate in our production environment. Figure 11 shows the results of using one
very rough placement heuristic: a sort of bubble sort which
swaps nodes between two cells at random if doing so would
improve locality. In this simulation, we considered 20 candidates per cell. Even with this simplistic and cheap approach
to placement, Physalia is able to offer significantly (up to 4x)
reduced probability of losing availability.
Abstract at: <a href="https://www.amazon.science/publications/millions-of-tiny-databases" rel="nofollow">https://www.amazon.science/publications/millions-of-tiny-dat...</a><p>> <i>...Physalia is a transactional key-value store, optimized for use in large-scale cloud control planes, which takes advantage of knowledge of transaction patterns and infrastructure design to offer both high availability and strong consistency to millions of clients. Physalia uses its knowledge of data center topology to place data where it is most likely to be available. Instead of being highly available for all keys to all clients, Physalia focuses on being extremely available for only the keys it knows each client needs, from the perspective of that client.</i><p>> <i>...We believe that the same patterns, and approach to design, are widely applicable to distributed systems problems like control planes,configuration management, and service discovery.</i><p>It'd be interesting to constrast this approach with Route53's or IAM's datastore which need to be globally-replicated with time-bounded eventually-consistent reads, and transactional but verifiable writes.<p>I hope AWS begins publishing about S3, now. One can look at the patents AWS engineers author to get a feel for some of the internals, but they are (intentionally?) hard to read.<p>For instance, patents filed by two of the many S3 founding-engineers: <a href="https://patents.google.com/?inventor=James+Christopher+Sorenson%2c+III,Allan+H.+Vermeulen&oq=inventor:(James+Christopher+Sorenson%2c+III)+or+inventor:(Allan+H.+Vermeulen)" rel="nofollow">https://patents.google.com/?inventor=James+Christopher+Soren...</a><p>Also see:<p><a href="https://aws.amazon.com/builders-library/" rel="nofollow">https://aws.amazon.com/builders-library/</a><p><a href="https://research.google/pubs/" rel="nofollow">https://research.google/pubs/</a>
Tangentially related, BigQuery uses a similar usage-based approach to place and replicate data in a manner that's likely to be available for users:<p><a href="https://cloud.google.com/blog/products/data-analytics/how-bigquery-zone-assignments-work" rel="nofollow">https://cloud.google.com/blog/products/data-analytics/how-bi...</a>