Is it me, or does the hand waving at the beginning of the article between "write" and "update" smell of bad spin? As a developer I consider both "creates"/"updates" as "writes".<p>"Riak is designed to accomodate (sp) server and network failures without ever losing committed writes, so this led to a quick response from Basho’s engineers."<p>As such losing a write to me when I read documentation is losing either a create, update or a delete. Any side affecting operation essentially. Anything that needs to write to disk to record a change...
Distributed systems design aside, the core of the problem is that they relied on ntp (as they probably should), and in their case ntp was not working properly.
I don't get how clocks are bad this from the article.<p>I get that syncing clocks across systems is hard and when it goes awry, unintended consequences are incurred.
Dumb question: What breaks with the following approach?<p>1) set last_write_wins=true (so all updates, always apply, as described in the article)<p>2) avoid the "partition/rejoin may cause old values to stomp on new" issue by having "rejoin detection" which refuses to rejoin if clocks are "too out of sync"