TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Millions of Tiny Databases [pdf]

173 pointsby aratnoover 5 years ago

4 comments

mjbover 5 years ago
This is my paper (along with Tao and Fan). It&#x27;s a great feeling to have this published and available, and I&#x27;m super proud of the team behind Physalia.<p>There&#x27;s a lighter-weight introduction to the work here: <a href="https:&#x2F;&#x2F;www.amazon.science&#x2F;blog&#x2F;amazon-ebs-addresses-the-challenge-of-the-cap-theorem-at-scale" rel="nofollow">https:&#x2F;&#x2F;www.amazon.science&#x2F;blog&#x2F;amazon-ebs-addresses-the-cha...</a> and for those attending NSDI, I&#x27;ll be talking about Physalia in the &quot;Deployment Experience&quot; session on Wednesday.
评论 #22331342 未加载
评论 #22330868 未加载
评论 #22330444 未加载
评论 #22332763 未加载
kthejoker2over 5 years ago
Thought for sure this&#x27;d be a thinkpiece on Excel in the enterprise ...<p>Seriously, though, this whole paper uses an amazing amount of terminology - blast radius, colony, color, game day, split brain - and an awesome biological metaphor of the Portuguese man o&#x27;war.<p>Great read even if you don&#x27;t care about fault tolerance, CAP theorem, or distributed balancing at AWS-scale.<p>One sample quote of the value of cheap heuristics over full-blown number-crunching:<p>&gt; Globally optimizing the placement of Physalia volumes is not feasible for two reasons, one is that it’s a non-convex optimization problem across huge numbers of variables, the other is that it needs to be done online because volumes and cells come and go at a high rate in our production environment. Figure 11 shows the results of using one very rough placement heuristic: a sort of bubble sort which swaps nodes between two cells at random if doing so would improve locality. In this simulation, we considered 20 candidates per cell. Even with this simplistic and cheap approach to placement, Physalia is able to offer significantly (up to 4x) reduced probability of losing availability.
ignoramousover 5 years ago
Abstract at: <a href="https:&#x2F;&#x2F;www.amazon.science&#x2F;publications&#x2F;millions-of-tiny-databases" rel="nofollow">https:&#x2F;&#x2F;www.amazon.science&#x2F;publications&#x2F;millions-of-tiny-dat...</a><p>&gt; <i>...Physalia is a transactional key-value store, optimized for use in large-scale cloud control planes, which takes advantage of knowledge of transaction patterns and infrastructure design to offer both high availability and strong consistency to millions of clients. Physalia uses its knowledge of data center topology to place data where it is most likely to be available. Instead of being highly available for all keys to all clients, Physalia focuses on being extremely available for only the keys it knows each client needs, from the perspective of that client.</i><p>&gt; <i>...We believe that the same patterns, and approach to design, are widely applicable to distributed systems problems like control planes,configuration management, and service discovery.</i><p>It&#x27;d be interesting to constrast this approach with Route53&#x27;s or IAM&#x27;s datastore which need to be globally-replicated with time-bounded eventually-consistent reads, and transactional but verifiable writes.<p>I hope AWS begins publishing about S3, now. One can look at the patents AWS engineers author to get a feel for some of the internals, but they are (intentionally?) hard to read.<p>For instance, patents filed by two of the many S3 founding-engineers: <a href="https:&#x2F;&#x2F;patents.google.com&#x2F;?inventor=James+Christopher+Sorenson%2c+III,Allan+H.+Vermeulen&amp;oq=inventor:(James+Christopher+Sorenson%2c+III)+or+inventor:(Allan+H.+Vermeulen)" rel="nofollow">https:&#x2F;&#x2F;patents.google.com&#x2F;?inventor=James+Christopher+Soren...</a><p>Also see:<p><a href="https:&#x2F;&#x2F;aws.amazon.com&#x2F;builders-library&#x2F;" rel="nofollow">https:&#x2F;&#x2F;aws.amazon.com&#x2F;builders-library&#x2F;</a><p><a href="https:&#x2F;&#x2F;research.google&#x2F;pubs&#x2F;" rel="nofollow">https:&#x2F;&#x2F;research.google&#x2F;pubs&#x2F;</a>
评论 #22330320 未加载
vimotaover 5 years ago
Tangentially related, BigQuery uses a similar usage-based approach to place and replicate data in a manner that&#x27;s likely to be available for users:<p><a href="https:&#x2F;&#x2F;cloud.google.com&#x2F;blog&#x2F;products&#x2F;data-analytics&#x2F;how-bigquery-zone-assignments-work" rel="nofollow">https:&#x2F;&#x2F;cloud.google.com&#x2F;blog&#x2F;products&#x2F;data-analytics&#x2F;how-bi...</a>