TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The Network is Reliable

57 pointsby r4umalmost 11 years ago

7 comments

falcolasalmost 11 years ago
The network is not reliable, but usually the cost of manually fixing problems arising from infrequent types of instability is less than the cost of pre-emptively addressing the issue.<p>As a practical example, our preferred HA solution for MySQL replication has effectively no network partition safety - if a network becomes partitioned, we&#x27;ll end up with split brain. However, we have not once had to deal with this specific problem in our years of operation on hundreds of servers.<p>That said, do make the assumption that your AWS instances will be unable to reach each other for 10+ seconds on a frequent basis. Your life will be happier if you&#x27;ve already planned for that.
评论 #8163517 未加载
评论 #8163407 未加载
peterwwillisalmost 11 years ago
Takeaways:<p>* Network partition tolerance can be designed around, assuming infinite time and money<p>* Network partition tolerance depends on the application<p>* Mitigating potential failure requires having a very long view on very fine details<p>* Most organizations will not be able to engineer solutions to address all network partition-related outages
评论 #8164370 未加载
jrullmannalmost 11 years ago
Great article. A lot of engineers don&#x27;t have personal experience with these kinds of network failures, so sharing stories of their consequences means more engineers can make informed (and conscious) decisions of how much risk can be tolerated for their applications.<p>One thing that you could gleam for this article-and I think that this is incorrect-is that the application or operations engineer is responsible for understanding the nuances of distributed systems. In my experience the number of people who are relying on distributed systems is much larger than the number of people who understand these issues.<p>So what we really need are systems we can build on whose developers understand how to build (and test!) the nuances of data convergence, consensus algorithms, split-blain avoidance, etc. We need systems to gracefully-and automatically-deal with and recover from network failures.<p>Full disclosure: I&#x27;m an engineer at FoundationDB
blutootalmost 11 years ago
I feel like the authors (or someone else) can do a lot more justice to their overall objective (i.e. tease out patterns) by applying some kind of a qualitative content analysis of case studies [0].<p>[0] <a href="http://www.qualitative-research.net/index.php/fqs/article/view/75/153January%202006" rel="nofollow">http:&#x2F;&#x2F;www.qualitative-research.net&#x2F;index.php&#x2F;fqs&#x2F;article&#x2F;vi...</a>
blutootalmost 11 years ago
There was some discussion on a preliminary version of this article&#x2F;blog-post[0] last year: <a href="https://news.ycombinator.com/item?id=5820245" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=5820245</a><p>[0] <a href="http://aphyr.com/posts/288-the-network-is-reliable" rel="nofollow">http:&#x2F;&#x2F;aphyr.com&#x2F;posts&#x2F;288-the-network-is-reliable</a>
jchrisaalmost 11 years ago
Related reading on data structures that make availability easier to maintain under network partition: <a href="http://writings.quilt.org/2014/05/12/distributed-systems-and-the-end-of-the-api/" rel="nofollow">http:&#x2F;&#x2F;writings.quilt.org&#x2F;2014&#x2F;05&#x2F;12&#x2F;distributed-systems-and...</a>
KaiserProalmost 11 years ago
The head states that the network is reliable, but then goes on to list lots of cases where the network fails.
评论 #8163976 未加载