TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Dynamo Systems Work Too Hard

78 点作者 jamesmiller5大约 12 年前

8 条评论

jbellis大约 12 年前
This misses the point.<p>There are two main reasons why, when I was researching scalable databases, I primarily gravitated towards Dynamo-style replication (Cassandra, Voldemort, and at the time, Dynomite):<p>- There is no such thing as failover. Dynamo replication takes node failure in stride. This is what you want for a robust system where "Network Partitions are Rare, Server Failures are Not." Not only does it prevent temporary unavailability during the failover, it rules out an entire class of difficult, edge-case bugs. (Which every master-election-and-failover system out there has been plagued with.)<p>- It generalizes to multiple datacenters as easily as to multiple machines, allowing local latencies for reads AND writes, in contrast to master-based systems where you always have to hit the master (possibly cross-DC) for at least writes. (Couchbase is unusual in that it apparently forces read-from-master as well.) Cassandra has pushed this the farthest, allowing you to choose synchronous replication to local replicas and asynchronous to remote ones, for instance: <a href="http://www.datastax.com/docs/1.2/dml/data_consistency" rel="nofollow">http://www.datastax.com/docs/1.2/dml/data_consistency</a><p>/Cassandra project chair
评论 #5652895 未加载
pbailis大约 12 年前
There's at least one good reason for Dynamo's write-to-all and read-from-all mechanism: latency.<p>What you've called 'W=2' in Couchbase is "write to master and at least one slave." Dynamo-style 'W=2' means "write to any two replicas." This can decrease tail latencies since you don't have to wait for the master--any two will do; similarly for 'R=2'. Indeed, Dynamo 'W=2, R=2' will incur more read load than master-based reads (at least double, but not necessarily triple, in your figures). So I think it's more accurately a trade-off between latency and server load.<p>There can be big benefits to this redundant work. For example: <a href="http://www.bailis.org/blog/doing-redundant-work-to-speed-up-distributed-queries/" rel="nofollow">http://www.bailis.org/blog/doing-redundant-work-to-speed-up-...</a><p>But don't take it from me: <a href="http://cacm.acm.org/magazines/2013/2/160173-the-tail-at-scale/fulltext" rel="nofollow">http://cacm.acm.org/magazines/2013/2/160173-the-tail-at-scal...</a><p>Anyway, I'm pretty sure CASSANDRA-4705 (<a href="https://issues.apache.org/jira/browse/CASSANDRA-4705" rel="nofollow">https://issues.apache.org/jira/browse/CASSANDRA-4705</a>), which allows for Dean-style redundant requests, both decreases the read load (at least from the factor of N in your post) <i>and</i> should still reduce tail latency without compromising on semantics.<p>I don't have skin in this game, but I'm pretty sure that the Dynamo engineers had a good idea of what they were doing. (That said, the regular [non-linearizable] semantics for R+W&#62;N are sort of annoying compared to a master-slave system, but can be fixed with write-backs.)
评论 #5653710 未加载
gigq大约 12 年前
This also exactly describes how HBase works. I've always preferred HBase to Cassandra for this exact reason. You put far less read load on your servers and you don't have to worry about most of the things on <a href="http://wiki.apache.org/cassandra/Operations" rel="nofollow">http://wiki.apache.org/cassandra/Operations</a>.<p>Another benefit that is not mentioned is that with a master based system you can easily move who is responsible for the data if a server starts to hotspot. In Cassandra you have to use random key distribution because if you have a server hotspot then the only solution is to split the token ring which is an intensive operation that is hard to do while the server is under heavy load.
评论 #5654194 未加载
评论 #5653885 未加载
gukjoon大约 12 年前
Great article, Damien. This idea that network partitions are exceedingly rare was the reason why ElasticSearch goes CA vs. the AP many other NoSQL datastores choose.<p><a href="http://elasticsearch-users.115913.n3.nabble.com/CAP-theorem-tp891925p894234.html" rel="nofollow">http://elasticsearch-users.115913.n3.nabble.com/CAP-theorem-...</a><p>Not only are network partitions rare, the most disastrous case where the cluster splits in half is even rarer. Usually, you have a small part of the cluster partition away.<p>I hope people don't take this as a Dynamo vs. Couch discussion, because the relative importance of partition tolerance is a topic that spans all datastores that give up on ACID.
评论 #5654878 未加载
crb大约 12 年前
When your units of networking concern are "availability zones" (i.e. data centers) rather than just switches, wouldn't network failures now be more common than server failures?
评论 #5654416 未加载
jeremiahjordan大约 12 年前
even if switch failures are rarer, couch at W=1 will silently drop data for network partition, dynamo at W=2 won't, how is the comparison at the end valid?
评论 #5652749 未加载
pkolaczk大约 12 年前
Why compare MTBF of a single network switch to MTBF of a node? Why not compare MTBF of a single network switch to MTBF of a single CPU or a motherboard? Unless you're talking about hobby-size network, there is usually much more between the nodes than a single network switch.
leef大约 12 年前
Despite the name you can't actually assume DynamoDB is based on the Dynamo paper architecture.
评论 #5654408 未加载
评论 #5654102 未加载