TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Windows Azure Service Disruption on Feb 29th, 2012

46 点作者 FrancescoRizzi大约 13 年前

6 条评论

panarky大约 13 年前
tl;dr<p>1. On February 29, 2012, new certificates created with a one-year expiration date by adding 1 to the year. Since February 29, 2013 is an invalid date, VMs wouldn't start.<p>2. After multiple attempts to restart failed VMs, physical hosts marked as failed, and VMs migrated to other physical machines -- the problem propagates.<p>3. Management services disabled to prevent customers from starting more VMs, compounding the problems.<p>4. After leap-day bug fixed, secondary failures caused by mixing up incompatible versions of a networking plugin, so VMs had no network access.<p>5. Total duration of outages: about 16 hours.<p>6. 33% of a month's service to be credited to all customers, regardless of who was affected.
评论 #3687512 未加载
pilif大约 13 年前
<i>cough</i> <a href="http://thedailywtf.com/Articles/DATE_NOT_FOUND.aspx" rel="nofollow">http://thedailywtf.com/Articles/DATE_NOT_FOUND.aspx</a><p>And this is why you always use your framework's or language's date arithmetics library and never try to hack up a solution on your own. Date calculations alone are hard enough with the basic irregularities of month lengths. Add the leap years and it becomes even harder.<p>And don't get me started on times, especially once time zones and summertime comes into play.<p>Likely your particular hacked-together solution will fail at some point. And if it doesn't: was it worth all the effort you put into making it perfect, especially considering that somebody has already done it for your framework.<p>NIH at its finest.
评论 #3687785 未加载
评论 #3688734 未加载
cypherpunks01大约 13 年前
How do you all generally handle leap days when doing time math? If you're selling a service for one year, are you selling 365 days (02/28/12 - 02/26/13) or do you just give away the leap day for free (02/28/12 - 02/27/13)? Do you pay your salaried employees one day extra on a leap year?<p>What other leap year bugs have people run into? Generally the libraries I work with (e.g. python's timedelta) don't let you add months or years because of their ambiguity.
评论 #3687211 未加载
评论 #3687249 未加载
评论 #3687217 未加载
rdcastro大约 13 年前
Working at Microsoft (in Windows Azure), this was the first outage since I joined the org, so I did not know what to expect from the company in terms of transparency on this outage. However, given other presentations or papers on the Windows Azure technology and how open they were publicly, I expected a good job here.<p>Bill Liang's post confirmed how transparent Microsoft wants to be with its customers, what is really nice. And I appreciate how seriously Microsoft is attempting to learn from these incidents and putting measures in place.
kogir大约 13 年前
The article really is worth a read if you build complex systems. My takeaway from this is that you shouldn't schedule maintenance work during "weird" times.<p>Had they not been deploying new code on leap day (UTC), the outage would have been substantially less severe. Code that uses dates and times will have bugs, because it's hard. Don't complicate things further.<p>So from now on, no more leap day, daylight savings time, or new years maintenance. It's worth postponing a day just in case.
评论 #3716112 未加载
评论 #3716108 未加载
评论 #3716091 未加载
评论 #3716087 未加载
评论 #3716074 未加载
recoiledsnake大约 13 年前
That seems to be incredibly well-detailed, much more than Amazon's or others' responses to their outages so far.