I've worked on Ambry at LinkedIn for a little while, I'd be happy to answer any questions about architecture or things we've done since 2016. I wasn't part of the original team. One thing I would call attention to from the article:<p>> it’s key-value based approach to interacting with blobs doesn’t support file-system like capabilities, posing more of a burden on the user of the system (who must manage metadata and relationships between entities themselves).<p>I think this trade-off is one Ambry's strongest design decisions. By giving up key-value access, Ambry gets to dictate the location of an object at write time. When a partition fills up, set it to read-only and create new partitions on new hosts. By having Ambry generate the blob ID, the system can embed information (like the partition number) right in the ID. With a key-value approach you need to worry about balancing (and re-balancing) the key space over your topology. With dense storage nodes, re-balancing is VERY expensive.<p>Also--most applications don't actually need key-value access. For storing something like media (think: LinkedIn profile photo), you've already got a database row for the user profile; now one of those fields is a reference to your object store. It might as well be a storage-generated reference instead of one where the application tries to manage reference uniqueness and ends up using UUIDs or something similar anyway.<p>Apologies for the new account, I try to keep my main HN account semi-anonymous.
Etymology: an ambry is a cabinet in a church for storing sacred items such as vessels or vestments. A common usage today is to store holy anointing oils.<p><a href="https://en.wikipedia.org/wiki/Ambry" rel="nofollow">https://en.wikipedia.org/wiki/Ambry</a>
I have worked on Ambry for a few years now. This paper is a bit old, but captures most of the core concepts of Ambry.<p>Some of the most fascinating parts of the journey after this paper have been scaling the system to support hundreds of GiBps of throughput per cluster, multiple workloads supporting other databases and stream processing systems, and rethinking the replication to make it compatible with public cloud. Checkout the GitHub repo to learn more <a href="https://github.com/linkedin/ambry">https://github.com/linkedin/ambry</a>