I like Slide 251: 'solve it at the application layer' :)<p>I'm part of a team responsible for cloud-based video editing software. We use multi-master replication (perhaps also known as optimistic replication) with our own tools throughout, but it does require careful design - keep as much data as immutable as possible, every piece of data that might be updated by different machines at the same time should have its own row, GUIDs on each row.<p>Each machine can generate its own local IDs, which look a lot like a timestamp with some unique stuff on the end. Each row gets a GUID and a 'version' ID column, and we only update relayed database updates if the incoming version is newer. This is largely last-timestamp-wins for the case of conflicts (rare because of design decisions), but there is some Lamport timestamp behaviour in there too for updating a existing row.<p>The main downside is still that all machines need to handle every write, but with batching up incoming processing into larger transactions, we've had no problems with quite a number of database updates on a dozen commodity machines. Obviously filtering into different shards would be an easy solution.<p>I'm looking forward to seeing what other people are doing with multi-master replication.
Couple years ago I was looking at building highly available database (MySQL in particular), and looked into the multi-master setup. While sounds good on paper, its benefits don't warrant the high development and operation cost.<p>- The tables need to be changed and the application layer needs to be changed to support it, which is a big hassle and very fragile. It's easy to introduce update conflict. It's a nightmare when dealing with group of updates in a transaction. You can't really roll back a transaction at the replicated nodes.<p>- Whenever a node fails, the replication ring is broken and updates pile up at the previous node, while subsequent nodes' data become stale. It requires immediate human attention to fix it, which defeats the purpose of a HA cluster.<p>- Related to above. It's very difficult to add a new master node without stopping the cluster. The "catchup" process is very manual and fragile.<p>- Data in different node becomes stale under high replication load. Clients reading different masters would get stale data. They are supposed to be masters and got stale data?!<p>- Multi-master doesn't help write scalability as all; all nodes need to handle all writes. MySQL's single thread update in replication doesn't help. For read scalability, master-slave is better.<p>I abandoned the design after a while and chose a different approach. I ended up using a disk-based replication, like DRDB. A two-machine cluster forms the master cluster, one active and one standby. Writes are replicated to both machines at disk level synchronously. When the active node fails, the standby becomes active within seconds automatically with the complete data on disk.<p>The beauty of this approach is the simple design and setup. The data are always in sync, no stale data. Failover is immediate and automatic. The failed node can automatically catch up when back online. The database application doesn't need any change and all the SQL semantics are preserved. The cluster has one IP so the clients don't need special connection logic. They just need to retry when connection fails.<p>For disaster recovery, I built another two-machine cluster in another datacenter acting as the slave, which did async replication from the master cluster. When the two-machine master cluster completely failed (as in the datacenter got burnt down), the slave cluster can become master via a manual process within 30 minutes. The 30 minutes SLA is for someone got paged, look at the situation and decide to fail over. There are too many uncertainties across datacenters to fail over automatically.<p>Added bonus, slaves can still hang off the master cluster for read scalability. And it works with any disk-based databases, not just MySQL.
I'm always in favour of technology which solves the replication issues in mysql. It's been one of the most painful parts of managing databases for myself.<p>I would like to throw in the tungsten replicator into this discussion <a href="http://code.google.com/p/tungsten-replicator/" rel="nofollow">http://code.google.com/p/tungsten-replicator/</a>. I have been researching it for the past few weeks as a replacement to mysql's built in replication hoping to solve a lot of the current pitfalls especially in 5.0.x. There is a very thorough guide here <a href="http://tungsten.sourceforge.net/docs/Tungsten-Replicator-Guide/Tungsten-Replicator-Guide.html" rel="nofollow">http://tungsten.sourceforge.net/docs/Tungsten-Replicator-Gui...</a>
I don't know much about replication but what happens when a master dies just after the database is updated. Will the changes not be replicated? If the user see that the second master does have the latest changes and he repeat his latest actions, what will happen when the first master goes back online? Will the data be duplicated or merged?
"what happens with collisions?
when two databases
update the same record
It's a race condition
solution?
solve it at the application layer "<p>I suspect this is a deal breaker for many. It certainly is for me.
Ok. It's now on GitHub if anyone is interesting in following:<p><a href="https://github.com/alfie/MySQL--Replication" rel="nofollow">https://github.com/alfie/MySQL--Replication</a>