TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

AWS power failure in US-EAST-1 region killed some hardware and instances

129 点作者 bratao超过 3 年前

7 条评论

iJohnDoe超过 3 年前
Not piling on AWS. These things happen and I’m sure everyone involved is working to improve things. Yes, everyone should deploy in multiple availability zones.<p>My 2 cents. Outages happen. Network glitches happen. Bad configs and bad updates happen. However, power issues should not really happen. One of the primary cost saving areas of going to the cloud is not having to do on-prem power, such as UPS, generators, maintenance etc. Not having to do on-prem cooling is another thing. These should be solved from the customer’s perspective when going into a professional data center and are the things you don’t want to worry about anymore.
评论 #29666607 未加载
评论 #29666140 未加载
CoastalCoder超过 3 年前
Any idea why a power <i>failure</i> would cause (or reveal) hardware damage?<p>Leading up to Y2K, I remember concerns about spinning hard disks not being able to start up again.<p>And if the power is <i>flaky</i> with spikes and brown-outs, I understand that&#x27;s a problem.<p>But is either of those relevant to AWS?
评论 #29666969 未加载
评论 #29665960 未加载
评论 #29666745 未加载
评论 #29665769 未加载
评论 #29666235 未加载
评论 #29666832 未加载
评论 #29666278 未加载
评论 #29666739 未加载
评论 #29667877 未加载
评论 #29666209 未加载
zkirill超过 3 年前
Have there been any incidents that affected more than one AZ?<p>AWS RDS in multi-AZ deployment gives you two availability zones. Aurora gives you three. What kind of scenario would be used to justify three AZ’s for the purposes of high availability?
评论 #29666699 未加载
评论 #29667148 未加载
gundmc超过 3 年前
Rare to see power issues at a modern data center cause downtime. All of those racks should have UPS and batteries to sustain during an outage until the automatic transfer switch can fail over to a redundant system or generator. Would be interested in reading more about what happened here.
评论 #29666232 未加载
评论 #29671008 未加载
testemailfordg2超过 3 年前
Not an expert, but a more frequent and controlled power cycle of servers with HDD can make sure not a lot of servers go down in one go, when events like these happen. Also, gives you chance to identify patterns...
1cvmask超过 3 年前
A lot more companies will go to a multi-cloud active active architecture with maybe even bare metal redundancies.
评论 #29665526 未加载
评论 #29665703 未加载
评论 #29665603 未加载
评论 #29665806 未加载
评论 #29665610 未加载
评论 #29665581 未加载
daneel_w超过 3 年前
<i>&quot;As is often the case with a loss of power, there may be some hardware that is not recoverable...&quot;</i><p>No. Not even rarely. If they lost hardware because of this something much different than just loss of power happened on their servers&#x27; mains rails.
评论 #29666048 未加载
评论 #29666303 未加载
评论 #29665739 未加载
评论 #29665772 未加载
评论 #29666480 未加载
评论 #29666620 未加载