科技回声

9 条评论

So... they're not running any kind of devops system at all. If they were, they could just run the upgrade on the image, test it, deploy it. All the "months of careful planning and many many tests" they did are basically wasted time.I wouldn't be proud of this, quite the opposite. I would suggest critically reviewing the entire infrastructure management strategy since months lost to a single upgrade is obviously indicative of greater problems.

评论 #14805983 未加载

评论 #14805892 未加载

评论 #14806841 未加载

falcolas将近 8 年前

A nice writeup of a neat (if risky) upgrade.> static IPsFWIW, I personally love Virtual IPs (VIPs) for this (basically, an existing network interface advertises serving more than one IP, and can change that IP dynamically between servers with an arp call). The downside is that there are a lot of cloud providers who don't support externally available VIPs. They do, however, offer their own nearly-identical solution (such as Elastic IPs from Amazon).The use of VIPs or similar could have potentially avoided the need for such a risky upgrade, potentially also saving millions of dollars in the process. Of course, I could simply be missing some hidden requirement from customers that they couldn't use VIPs but that's pretty uncommon, even in the finance industry.

评论 #14805941 未加载

markatto将近 8 年前

They're still taking downtime for this... Even if they're forced to have a no-VIP no-HA no-LB setup (seems insane to me) it should be much simpler to set the DNS TTL to a low value right before and switch it to the new IP after the new box comes up.

评论 #14807119 未加载

评论 #14808091 未加载

gargravarr将近 8 年前

On the one hand I am very impressed they managed this, but on the other, it does seem very sledgehammer/nut-esque. Even without virtual IPs, it seems a little silly that their customers weren't running N+1 redundant instances that could be taken out, upgraded and then swapped without disrupting normal operations.Again, very impressive as an academic exercise, especially considering the given script isn't actually that complicated, but wow, they had some serious guts running this in production!

sdiq将近 8 年前

"It was like replacing the wheels on a moving vehicle"That reminds me of this crazy video I once watched. <a href="https://youtube.com/watch?v=Cad8fyYeFnY" rel="nofollow">https://youtube.com/watch?v=Cad8fyYeFnY</a>

mankash666将近 8 年前

I think the authors lost a good opportunity to move towards containers to avoid these problems in the future. While interesting academically, is wrong for the long run

astrodust将近 8 年前

Given how much memory some servers have these days, which for an application node is often more than the necessary hard-disk capacity, this is quite a clever approach.

loa_in_将近 8 年前

My first thought was that it talks about `reboot` binary

conatus将近 8 年前

Very nice!

9 条评论

contingencies将近 8 年前

评论 #14805983 未加载

评论 #14805892 未加载

评论 #14806841 未加载

falcolas将近 8 年前

评论 #14805941 未加载

markatto将近 8 年前

评论 #14807119 未加载

评论 #14808091 未加载

gargravarr将近 8 年前

sdiq将近 8 年前

mankash666将近 8 年前

I think the authors lost a good opportunity to move towards containers to avoid these problems in the future. While interesting academically, is wrong for the long run

astrodust将近 8 年前

Given how much memory some servers have these days, which for an application node is often more than the necessary hard-disk capacity, this is quite a clever approach.

loa_in_将近 8 年前

My first thought was that it talks about `reboot` binary

conatus将近 8 年前

Very nice!

Don’t run this on any system you expect to be up they said, but we did it anyway

9 条评论

Don’t run this on any system you expect to be up they said, but we did it anyway

9 条评论