Quick things<p>1) I expected that thread to look like a Cultural Revolution struggle session". Thankfully it wasn't.<p>2) As I am sure many others have already said it, durability has very little to do with CAP. CAP is about the A and I in in ACID, D is an orthogonal concern.<p>3) Durability doesn't necessarily mean losing high performance. Most databases let the user choose how much data they're willing to lose and for what latency decreases -- the standard approach (used even in main-memory databases like RAMCloud[1] and recent versions of VoltDB[2]) is to keep a separate write-ahead log (WAL) and let the end-user choose how frequently to fsync() it to disk as well as how frequently to flush a snapshot of main memory data structures to disk.<p>There are many papers (e.g., <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.174.6205" rel="nofollow">http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.174....</a>) that talk about various challenges of building WALs, but fundamentally users who want strongest possible single-machine durability can choose to fsync() on every write (and usually use battery backed raid controllers or separate log devices like SSDs with supercapacitors or even NVRAM if the writes are going to be larger than what fits into the RAID controller's write-back cache). Others can choose to live with possibility of losing some writes, but use replication[3] to protect against power supply failure and crashes -- idea being that machines in a datacenter are connected to a UPS, replicas don't all live on the same rack (to protect against -- usually rack local -- UPS failure), and there's cross-data center replication (usually asynchronous with possibility of some conflicts -- notable exception being Google's Spanner/F1) to protect (amongst many other things...) against someone leaning against the big red-button labeled "Emergency Power Off" (which is exactly what you think it is).<p>Flushes of main data do also hurt bandwidth with spinning disks and old or cheap SSDs, but there's a solution: use a a good, but commodity MLC SSD with synchronous toggle NAND flash with a good controller/firmware (Sandforce SF2000 or later series, Intel/Samsung/Indilinx's recent controllers) -- these work on the same principle as DDR memory (latch onto both edges of a signal) to provide sufficient bandwidth to handle both random reads (traffic you're serving) and sequential writes (the flush).<p>4) I known several tech companies and/or engineering departments therein who absolutely love and swear by redis. There are very good reasons for it: the code is extremely clean and simple[4] and it handles a use case that neither conventional databases nor pure kv-stores or caches handle well.<p>That use case is roughly described data structures on an outsourced heap for maintaining a materialized view (such as a user's newsfeed, adjacency lists of graphs stored efficiently using compressed bitmaps, counts, etc...) on top of a database. So my advise to antirez is to focus the effort around making this use case simpler rather than build redis out into a database: build primitives to let developers piggy back durability and replication to a database or a message queue. In fact, I've known of multiple startups that have (in an ad-hoc way) implement pretty much exactly that.<p>This is still a tough problem, but one which (I think) would yield a lot more value to redis users. Just thinking out loud, one approach could be a way to associate each write to redis with an external transaction id (HBase LSN, MySQL gtid, or perhaps an offset in a message queue like Kafka). When redis flushes its data structures to disk, it stores the last flushed transaction id to persistent storage.<p>I would also implement fencing within redis: when in a "fenced" mode redis won't accept any requests on a normal port, but can accept writes through an bulk batch update interface that users can program against. This could be more fine grained by having both a read-fence and a write fence, etc...<p>This makes it easier for users to tackle replication and durability themselves:<p>For recovery/durability, users can configure redis such that after a crash, it is automatically fence and "calls-back" with that last flushed id into users' own code -- by either invoking a plugin, doing an REST or RPC call to a specified endpoint, or simply using fork() and executing a user configured script which would use the bulk API.<p>For replication, users could use a high-performance durable message queue (something I'd imagine some users already do) -- a (write-fenced) standby redis node can then become a "leader" (unfence itself) once its caught up to the latest "transaction id" (last consumed offset in the message queue, as maintained by the message queue itself -- in case of Kafka this is stored in ZooKeeper). More advanced users can tie this with database replication by either tailing the database's WAL (with a way to transform WAL edits into requests to redis) or using a plugin storage engine for the database.<p>Fundamentally, where I see redis used successfully are uses cases where (prior to redis) users would use custom C/C++ code. This cycles back to the "outsourced on heap data structures" idea -- redis lets you use a high level languages to do fast data manipulation without worrying about performance of the code (especially if using a language like Ruby or Python) or garbage collection on large heaps (a problem with even the most advanced concurrent GCs like Java's).<p>There have been previous attempts to build these outsourced heaps as end-to-end distributed system that handle persistence, replication, and scale-out and transactions. These are generally called "in-memory data grids" -- some simply provide custom implementations of common data structures, others act almost completely transparent and require no modifications to the code (e.g., some by using jvmti). Terracotta is a well known one with a fairly good reputation (friends who contract for financial institutions and live in hell^H^H^H^H world of app servers and WAR files swear by it), but JINI and JavaSpaces were some of the first (too came too early, way before the market was ready) and are rightly still covered by most distributed systems textbooks. However their successful use usually requires Infiniband or 10GbE (or Myrinet back in dotcom days) -- reliable low-latency message delivery is needed as (with no API to speak off) there's no easy way for users to recover from network failures or handle non-atomic operations.<p>To sum it up, I'd suggest to examine and focus on use-case where redis is already <i>loved</i> by its users, don't try to build a magical end to system as it won't conserve the former, and make it easy (and to an extent redis already does this) to let users build custom distributed systems with redis as a well-behaved component (again, they're already doing this).<p>[1] <a href="https://ramcloud.stanford.edu/wiki/display/ramcloud/Recovery" rel="nofollow">https://ramcloud.stanford.edu/wiki/display/ramcloud/Recovery</a><p>[2] <a href="http://voltdb.com/intro-to-voltdb-command-logging/" rel="nofollow">http://voltdb.com/intro-to-voltdb-command-logging/</a><p>[3] Whether it's synchronous or not is about the atomicity guarantees and not durability -- the failure mode of acknowledging a write and then 'forgetting' can happen in these systems even if they fsync every write.<p>[4] It reminds me of NetBSD source code: I can open up a method and it's very obvious what it does and how.