Amateur hour at AWS

122 pointsby roncohenabout 11 years ago

19 comments

daviduabout 11 years ago

I understand a certain level of displeasure at their lack of specificity while they mitigated the issue. But... in this case, the time to remediate doesn't really change your response to the threat. No matter what, you need to change all your keys, generate new private keys, etc.They got it fixed within 48 hours, globally, which, if you ask me, is incredible at their scale.I would hardly describe anything AWS does as amateur. But maybe that's just me.

评论 #7559056 未加载

评论 #7559086 未加载

评论 #7560624 未加载

评论 #7558979 未加载

jasonkesterabout 11 years ago

I read the words in your blog post and came to a completely different conclusion.Despite not putting time stamps on their communications (which you seem really really upset about), they fixed everything for everybody in like a single day. You, their customer (and better still, me, their customer) didn't have to lift a finger.This is exactly why we farm out infrastructure to companies like Amazon. They have a whole squad of smart people standing around leaning against a post all day every day waiting for something like this to happen so they can jump in and fix everything for us.My little one-player business has no such team o' dudes on standby. If it weren't for the fact that Amazon is cleaning up for me, I'd be two days into having a really bad time and getting no productive work done.

评论 #7559161 未加载

bashcoderabout 11 years ago

For future reference, AWS security bulletins can be found at:<a href="https://aws.amazon.com/security/security-bulletins/" rel="nofollow">https://aws.amazon.com/security/security-bulletins/</a>The author suggests that Amazon only made one post, updating it throughout the process. But at the link above you will see four posts regarding this issue, with the first one having been updated once to add information. No, they are not timestamped, but they are dated.While it may be fair to criticize AWS for its customer communications during this process, I'm fairly certain that if he had a backstage pass to such a comprehensive process of remediating thousands of production systems with zero downtime, he would perhaps find the team somewhat less... amateurish.

dm2about 11 years ago

Obviously they were working as fast as they possibly could without risking major outages. They probably had millions of servers to update.I'd even argue that it's not a good idea to advertise, "these servers are vulnerable to this attack".AWS is massive and organizing that kind of update by an army of engineers isn't easy.You received a non-personalized message because AWS support probably received tens of thousands of irate customers demanding that their systems be patched immediately. For some reason they weren't equipped to handle that kind of update but I'm sure they will learn from this and hopefully next time the response will be faster, if possible.

评论 #7559003 未加载

评论 #7559013 未加载

smoyerabout 11 years ago

So ... as a professional organization, why does OpsBeat continue to contract with such an amateur organization? You could scale up your own servers, round-robin DNS (more for ELB), etc if you're not happy with their performance.Instead, a group of dedicated professionals updated a world-wide infrastructure in less than two days. If you were running your own systems would you have managed that? Yes, you would have known exactly when you were done but could you predict ahead of time when you'd be done?So, as engineers we make trade-offs and AWS is a pretty clear winner when you look at the TCO of having a scalable architecture. Once you've made that trade-off, the down-side is that you don't have the ultimate flexibility provided by a bare-metal host.

eknkcabout 11 years ago

So they are amateurs because;- They managed to create and test a deployment procedure in a couple of hours. - Deployed this update to thousands of machines spread into multiple continents in 48 hours. - There were no downtimes. No action required from customers. - Eveything seems to in in order now.Yeah.. I hope they die in a fire.

评论 #7559157 未加载

评论 #7561635 未加载

评论 #7559838 未加载

peterwwillisabout 11 years ago

While the author is perhaps overly critical of the response of AWS to this issue, he has a point. When your business is suffering from an issue related to your provider and your customers are calling every 10 minutes to ask when you are going to fix it, you need your service provider to give you as much timely information as possible so you can relay it to your customers. The best thing they could have done would have been to provide gratuitous, empty information, every half hour. Even just saying "we're still working on it folks but no new information" will keep people calm and provide needed feedback for concerned individuals.

Gigablahabout 11 years ago

I just wish people would stop using "amateur" as an insult.

评论 #7559043 未加载

jewelabout 11 years ago

Hopefully this openssl issue has shown large organizations the need to have a way to quickly roll out security patches, ideally even before the vendor has released an updated package.Imagine tomorrow that someone finds a remotely exploitable kernel issue, perhaps involving UDP packet handling. If you have the right infrastructure in place, you should be able to drop the patch file in the right directory and run a script that builds a new system package, runs some automated testing, and then pushes that package out immediately using whatever rolling update strategy is normally used, but at an accelerated pace.I wish I had time to build something that makes patching system packages on debian systems simpler, making it trivial for businesses to "fork" the distribution as necessary to work around issues (whether they be security critical or not). I've written more thoughts on the matter on my blog: <a href="http://stevenjewel.com/2013/10/hacking-open-source/" rel="nofollow">http://stevenjewel.com/2013/10/hacking-open-source/</a>If you're managing a smaller set of servers, I've been pretty happy with apticron and nullmailer as a way to make sure security updates are applied everywhere. It'd be nice if it could receive notification of security issues faster, perhaps via some sort of push mechanism, but it at least gets things taken care of within 24 hours.

m-appabout 11 years ago

BTW, I remember from the CloudFlare blog that they were notified in advance of the bug and had already patched it. How come big names like AWS and Heroku did not get this prior information? Who decides on which companies hear it before the public does?From the Cloudflare blog: "This bug fix is a successful example of what is called responsible disclosure. Instead of disclosing the vulnerability to the public right away, the people notified of the problem tracked down the appropriate stakeholders and gave them a chance to fix the vulnerability before it went public."-- <a href="http://blog.cloudflare.com/staying-ahead-of-openssl-vulnerabilities" rel="nofollow">http://blog.cloudflare.com/staying-ahead-of-openssl-vulnerab...</a>

评论 #7559254 未加载

avenger123about 11 years ago

Don't throw out the baby with the bathwater. I think most users would be happy that the vulnerability was patched within 48 hours than the lack of proper communication.In this case Amazon choose to focus on fixing the problem versus communicating every detail. The potential consequence in dollars of not fixing the issue in a timely manner is likely in triple-digit millions. I would imagine the cost of having some small number of users complain about how they weren't up to date with communications isn't worth them focusing on it.Also, I would imagine that customers such as Netflix and other major clients likely got more in-depth communication.I'm sure it's not going to really affect Amazon's bottom line if opbeat decides to move to a different provider. If you are a very small fish in a big ocean, expect to be treated that way. It's sad to say that but that's the reality.Personally, I'll take them getting this fixed as a fast as possible, versus getting hourly updates telling me how they're still working it. For example, I just want to hear that they found the MH370 plane. I'm tired of reading news stories with the same depressing message.

Aqueousabout 11 years ago

This seems highly pedantic and nitpicky. It seems like you're looking for things to criticize.Use of Heroku as an example of communications leadership is misplaced. I have had support request sit around unanswered for days in Heroku. Don't get me wrong - I love Heroku. But you have to pay them an arm and a leg monthly in order to get quick support response time. AWS, on the other hand, seems very responsive to all requests.

neomabout 11 years ago

The fact that all of this was fixed so quickly given the size of AWS infrastructure is in and of itself very impressive. Sometimes there isn't much to say except "We're working on it" - You can argue semantics but half the internet was in the same boat yesterday and sure maybe they comminuted poorly but "Amature‎ hour" isn't fair imho.

xerophtyeabout 11 years ago

I have experienced issues with several online services, as I expect everyone here has as well. I am impressed with how Heroku handled it with mandatory updates every 3 hours, and handing out clear instructions to their customers and also apologizing when the procedures caused inconvenience. Not everyone handles these things with such care

robotponyabout 11 years ago

Amateur hour at upbeat.com ... OP clearly has limited experience with large hosting vendors.It's frustrating to wait for updated information, but AWS delivered reasonable details as they were available. Could it improve? Probably. Does it rate worse than other vendors? No, not at all. Consider recent Rackspace or Azure outages, information follows hours later, sometimes days.

rpowersabout 11 years ago

This seems a bit harsh. While it would have been nice to have a bit more transparency, calling them amateur is not realistic. It takes time to patch a bunch (read thousands) of servers and one day turn around is not all that bad.

brryantabout 11 years ago

If you think about the sheer number of logical load balancers they have to update (tens of thousands?) and the time it took them to update all of them, I'm incredibly impressed by their quick turn around.

snorkelabout 11 years ago

I'd rather they just fix the problem than tell me detailed bedtime stories.

developer786about 11 years ago

Totally off topic, but programmers, I REALLY need your help...<a href="https://news.ycombinator.com/item?id=7559067" rel="nofollow">https://news.ycombinator.com/item?id=7559067</a>