TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Zen and the Art of Reliability

65 点作者 perch56大约 3 年前

4 条评论

amdelamar大约 3 年前
&gt; 4. Fail small, reducing blast radius<p>One more thing not mentioned here, is that using a microservice architecture can naturally help isolate outages to small parts of your app&#x2F;website. Rather than take it down entirely.<p>My team supports a large microservice system, and while there are definite drawbacks to the architecture, one of the major benefits is that its never 100% down at any given time. Usually a prod incident will make one particular button flakey or one view&#x2F;page fail to load. Some users won&#x27;t even notice theres an outage. Oncall is paged and can quickly rollback the squeaky microservice to a previously deployed version, and let an engineer investigate the root cause in a test environment later.
kaycebasques大约 3 年前
&gt; In other words, we decided to measure only the systems we controlled. In retrospect that was naive. Our customers don’t care if we run the service that fails, or a vendor we use runs the service that fails. They care that they can’t use Zendesk to do their job.<p>Yes!
评论 #30622152 未加载
评论 #30623113 未加载
blakesterz大约 3 年前
Interesting to see how they do things. Of all the many things that we use at work, ZenDesk is my favorite. It never gets in my way, does things exactly the way I want them, it&#x27;s just great. GitHub is probably a close second. Slack and Basecamp somewhere in them middle. With anything from Atlassian always being my least favorite.
评论 #30621699 未加载
ram_rar大约 3 年前
Cloud services have come a long way. Not trying to diss the article, but scaling CRUD app for 250k&#x2F;sec is not as difficult as it used to be. It mainly comes down to how you manage state in your architecture.<p>Back when I was @ yahoo, serving 10k concurrent request from single server used to be such a big deal. Now, hardly anyone thinks about. Most of the reliability&#x2F;fault tolerance&#x2F;auto scaling features comes from underlying AWS&#x2F;GCP services. We just need to write decent microservice to glue these things together and voila!