TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Cloudflare Dashboard and API Outage on April 15, 2020

32 点作者 jplevine大约 5 年前

6 条评论

atonse大约 5 年前
This is not a critique, CloudFlare is clearly a solid, well engineered system given its scale, just look at their other post-mortems.<p>But it&#x27;s just kind of interesting, you can have all the redundant systems and smart software and some dude could accidentally pull cables – oh humans!<p>Would love to see what other mitigations they came up with than the ones listed (apart from probably putting 20 BRIGHT RED labels next to the patch panels saying DO NOT DISCONNECT, EVER EVER EVER!).<p>Perhaps one mitigation could be a better way to literally identify who&#x27;s there and call them up within seconds and ask what they just did?
bogomipz大约 5 年前
&gt;&quot;Documentation: After the cables were removed from the patch panel, we lost valuable time identifying for data center technicians the critical cables providing external connectivity to be restored. We should take steps to ensure the various cables and panels are labeled for quick identification by anyone working to remediate the problem. This should expedite our ability to access the needed documentation.&quot;<p>So they failed to label their cables? I&#x27;m sorry but this is &quot;datacenter 101&quot; stuff. How are none of the cables plugged into your patch panels labeled? Every colo has a label gun you can borrow! Also remote hands will gladly send you a pic of a rack or cabinet to verify what they&#x27;re looking at.
评论 #22886914 未加载
idrism大约 5 年前
It’s strange to me that their remediation did not include distributing these systems to be redundant across multiple datacenters, maybe with a globally distributed database.<p>&gt; we knew that the failback from disaster recovery would be very complex<p>The disaster recovery failover to a second data center (and failback) should not force a choice to failover or not. They should be able to immediately failover and the system should self-heal once the original data center was back online.
cookiecaper大约 5 年前
I&#x27;ll just leave this here ... <a href="https:&#x2F;&#x2F;github.com&#x2F;netbox-community&#x2F;netbox" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;netbox-community&#x2F;netbox</a>
rkwasny大约 5 年前
In summary, 10% of internet traffic relies on one patch panel somewhere :)
评论 #22890552 未加载
majjaa大约 5 年前
Is it me or does it feels like these post mortem blog post are becoming extremely common with Cloudflare.