I highly recommended reading “Designing data intensive applications “ by Martin Kleppmann to get a thorough overview with lots of references. Reading this book is a timesaver compared to finding all these information across blog posts.<p><a href="https://www.amazon.de/dp/1449373321/" rel="nofollow">https://www.amazon.de/dp/1449373321/</a>
This is a pretty much complete introduction to database replication techniques.<p>If you're more interested in the quorum-ing tech that's come about (algorithms like paxos and raft) and the underlying distributed systems research I've recently tried to compile all the paxos family of algorithms in a blog post[0]. There has been a lot of work recently (notably EPaxos, WPaxos, SDPaxos) on tuning and improving the algorithms, including consideration for long distance WAN connections (that's what the W in WPaxos stands for).<p>[0]: <a href="https://vadosware.io/post/paxosmon-gotta-concensus-them-all" rel="nofollow">https://vadosware.io/post/paxosmon-gotta-concensus-them-all</a>
This article is pretty thurough on the subject of replication. It goes through pretty much all the options and trade-offs I've learned over my career in database operations.<p>The takeaways, I hope people will get from this article is that there is no one solution which will work best in all situations. It is a design decision about what advantages your application needs to have and what sacrifices you are able to make. The more limited and strict you can be about how your application reads and writes data, the more options you have to make your database backend more robust. Applications that "try to have it all" usually just end up doing everything poorly.<p>The other takeaway is the the physical limits of the universe are the greatest barrier database systems are working against. You simply cannot get around them. When you end up working with a team who is unwilling to acknowledge this, run as fast as you can.
Very tiny nit: <i>DynamoDB is not Dynamo.</i> Leaderless replication was popularized by Amazon's original Dynamo paper, true, but DynamoDB is not the same system [1].<p>[1] <a href="https://www.allthingsdistributed.com/2012/01/amazon-dynamodb.html" rel="nofollow">https://www.allthingsdistributed.com/2012/01/amazon-dynamodb...</a>
There are solutions to this now:<p><a href="https://cockroachlabs.com" rel="nofollow">https://cockroachlabs.com</a><p><a href="https://foundationdb.org" rel="nofollow">https://foundationdb.org</a><p>Of course, there are still tradeoffs, but for many applications, a distributed database with strong consistency guarantees is likely to be a good choice for many projects.
This is a really great article and overview. It is a tad light on the most important subjects of CRDTs (Conflict free Replicated Data Types).<p>After having worked on these types of problems for a decade, and successfully running an Open Source database startup for nearly a half decade, they are certainly and only the right way to go.<p>They seem to suggest they aren't well adopted, and suggest there are severe limitations that are withholding their adoption.<p>This was true with append-only or log-based CRDTs.<p>But now, state-based graph CRDTs (like we've implemented in <a href="https://github.com/amark/gun" rel="nofollow">https://github.com/amark/gun</a> ) have solved that. The only thing you can't do with them is Global Strong Consistency (think banking), but their Strong Eventual Consistency guarantees make them the best solution for literally everything else.<p>They are also run in production at large large sites, like the Internet Archive (top 300 site globally), D.Tube (1M monthly uniques), notabug.io (P2P reddit), etc. with GUN.<p>The article kind of makes this joke:<p>> (otherwise our lives would be too easy, right?)<p>But here is the kicker, both Internet Archive and NAB integrated/built in 1 week. Literally yes it makes people's lives easier.<p>And you should too! Next time you build a non-banking app, you should consider using state-based graph CRDT to cover 99.99% of your use cases!