TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Don’t run this on any system you expect to be up they said, but we did it anyway

129 点作者 vdloo将近 8 年前

9 条评论

contingencies将近 8 年前
So... they&#x27;re not running any kind of devops system at all. If they were, they could just run the upgrade on the image, test it, deploy it. All the &quot;months of careful planning and many many tests&quot; they did are basically wasted time.<p>I wouldn&#x27;t be proud of this, quite the opposite. I would suggest critically reviewing the entire infrastructure management strategy since months lost to a single upgrade is obviously indicative of greater problems.
评论 #14805983 未加载
评论 #14805892 未加载
评论 #14806841 未加载
falcolas将近 8 年前
A nice writeup of a neat (if risky) upgrade.<p>&gt; static IPs<p>FWIW, I personally love Virtual IPs (VIPs) for this (basically, an existing network interface advertises serving more than one IP, and can change that IP dynamically between servers with an arp call). The downside is that there are a lot of cloud providers who don&#x27;t support externally available VIPs. They do, however, offer their own nearly-identical solution (such as Elastic IPs from Amazon).<p>The use of VIPs or similar could have potentially avoided the need for such a risky upgrade, potentially also saving millions of dollars in the process. Of course, I could simply be missing some hidden requirement from customers that they <i>couldn&#x27;t</i> use VIPs but that&#x27;s pretty uncommon, even in the finance industry.
评论 #14805941 未加载
markatto将近 8 年前
They&#x27;re still taking downtime for this... Even if they&#x27;re forced to have a no-VIP no-HA no-LB setup (seems insane to me) it should be much simpler to set the DNS TTL to a low value right before and switch it to the new IP after the new box comes up.
评论 #14807119 未加载
评论 #14808091 未加载
gargravarr将近 8 年前
On the one hand I am very impressed they managed this, but on the other, it does seem very sledgehammer&#x2F;nut-esque. Even without virtual IPs, it seems a little silly that their customers weren&#x27;t running N+1 redundant instances that could be taken out, upgraded and then swapped without disrupting normal operations.<p>Again, very impressive as an academic exercise, especially considering the given script isn&#x27;t actually that complicated, but wow, they had some serious guts running this in production!
sdiq将近 8 年前
&quot;It was like replacing the wheels on a moving vehicle&quot;<p>That reminds me of this crazy video I once watched. <a href="https:&#x2F;&#x2F;youtube.com&#x2F;watch?v=Cad8fyYeFnY" rel="nofollow">https:&#x2F;&#x2F;youtube.com&#x2F;watch?v=Cad8fyYeFnY</a>
mankash666将近 8 年前
I think the authors lost a good opportunity to move towards containers to avoid these problems in the future. While interesting academically, is wrong for the long run
astrodust将近 8 年前
Given how much memory some servers have these days, which for an application node is often more than the necessary hard-disk capacity, this is quite a clever approach.
loa_in_将近 8 年前
My first thought was that it talks about `reboot` binary
conatus将近 8 年前
Very nice!