Is Amazon's cloud service too big to fail?

182 点作者 azureel将近 8 年前

20 条评论

dalbasal将近 8 年前

This is (I was surprised) a pretty good article. Financial services are regulated and based on recent experience, they're concerned with systemic risk. Most industries do not have anyone responsible for worrying about this kind of thing.It seems reasonable to start worrying about the fragility potentially introduced by these massive internet infrastructure companies.

评论 #14908843 未加载

评论 #14908828 未加载

评论 #14908716 未加载

AmIFirstToThink将近 8 年前

If your architecture means your system goes down if AWS is down, then the question becomes can you replace AWS with something better that you can build, have means to build, have time to build, can keep running, can get enough momentum in term of sheer size of customer base to fund the upkeep of the platform?If you can't build/run a better AWS replacement then it's a mute point, isn't it?Then the question turns into if you can't build better AWS, can you architect your application to handle AWS failures? AWS itself lets you handle many kind of failures at AZ/DC level. Are you using that? For global AWS outages, can you have skeleton, survival critical system running on GCP or Azure?Have you thought about outages that would be out of your control and out of AWS's control e.g. malware, DDoS, DNS, ISP, Windows/Android/iOS/Chrome/Edge zero day? How are you going to handle outages due to those issues?If you are prepared to handle outages (communication, self-preservation, degraded mode, offline mode) then can a serious AWS outage be managed just like those outages?

评论 #14913226 未加载

评论 #14914441 未加载

barsonme将近 8 年前

Even at a smaller scale it is a little nerve-wracking to know be so reliant on one provider. If AWS tanks there's a fair amount of code that'd need to be changed just to switch over to Azure or GCE. Failover with, e.g., email providers is easy enough, but the entire cloud stack (for lack of better terms) is a completely different ballgame.

评论 #14908699 未加载

评论 #14913005 未加载

评论 #14908673 未加载

评论 #14909991 未加载

jpalomaki将近 8 年前

This goes to beyond having a plan-B for hosting your own stuff somewhere else. Think about all the 3rd party services you are depending on. Then think about how many dependencies those services have. How many trace back to Amazon on some level?The connections that could cause problems may not be obvious. For example network provider running into trouble because a ticketing or monitoring system that depends Amazon does not work. Hardware supplier not being able to ship spare parts for your on-premise SAN because logistics company runs into trouble due to issues at Amazon.

forkLding将近 8 年前

Personally as a dev, I find AWS's service in the middle of Paypal (shit, not sure why they're popular) to Stripe (Damn that was fast and easy) seeing as I used them both.Their support is alright although you often have to pay for it but AWS docs are atrocious and remind me of university textbooks written by professors who like creating pseudo-scientific-sounding jargon which mixed with their huge array of features is quite un-comforting to use for even people with intermediate AWS experience (built some apps with AWS before kind of people).I can see that there could be more specialized services like Firebase (which is built on Google Cloud) that should be built on AWS for the users. Firebase is a breeze to use and very responsive and I've used it to build real-time chat apps in a couple days.

martyvis将近 8 年前

It took me three reads of the first couple of paragraphs to realise that "snowball" and "snowmobile" were actually hardware products that you can touch. Tech news publishers need to do a jargon check and use appropriate punctuation, formatting or something to call out terms that 90% of readers would not have come accross

评论 #14913594 未加载

评论 #14913974 未加载

galkk将近 8 年前

When I was working as contractor for one of big banks, which dev was concentrated on Canary Wharf, they weren't able to successfully complete disaster recovery testing on their primary database cluster for 2 years in a row, I just don't remember, was is department-wide or bank-wide.Basically, each 6 months DR testing was failing and it was accepted as harsh reality. After seeing how they're working inside, I don't think that moving their infrastructure to AWS/Azure/Google is worst that could happen.disc: Currently working at Amazon, but not at AWS.

评论 #14913898 未加载

jondubois将近 8 年前

That's why I think containerization and orchestration will be useful; open source orchestrators can standardize the infrastructure and make switching seamless. That way the infrastructure remains a commodity.

评论 #14909025 未加载

cm2187将近 8 年前

What would be great is the equivalent of the ACME protocol for cloud service providers. That will take a while and shouldn't happen until the offering matures and stabilises. But in an ideal world you wouldn't tie your application to a specific cloud provider. You should be able to lift and shift to another provider.Which I think is a merit of using VMs as opposed to individual services.

评论 #14909061 未加载

评论 #14909006 未加载

acd将近 8 年前

Cloud services are concentrated by nature built with the same cloned DNA. Of course that is a systematic risk with so much it concentrated to fewer physical locations running on the same code.Think Cloned bananas vs fingers disease but computers. <a href="http://www.bbc.com/news/uk-england-35131751" rel="nofollow">http://www.bbc.com/news/uk-england-35131751</a>

cjsuk将近 8 年前

This does worry me. If there is a shortage of resources suddenly or a DC fire that takes out a region, then what?We have contingency against this via our own infrastructure but I worry about organisations who don't have any.

评论 #14908730 未加载

评论 #14908815 未加载

评论 #14909079 未加载

blazespin将近 8 年前

The solution is pretty simple, AWS/Azure need to provide on premise versions of their cloud.. You'd probably get stuck with a particular version, but better than nothing.

评论 #14908927 未加载

评论 #14909107 未加载

fovc将近 8 年前

I think about this problem every now and then for my own business, but not sure what the right answer is. Supporting multiple clouds requires more involved management of some pieces of infrastructure (e.g., DNS + healthchecks, DB replication), which introduces another point of failure.How do people who need to have more nines of availability manage this issue with cloud providers? (EC2 and RDS promise 3.5 nines per AZ, but I imagine outages are somewhat correlated across zones)

评论 #14911756 未加载

评论 #14909491 未加载

sharemywin将近 8 年前

Hasn't anyone heard of disaster recover plans? I used to work at a medium sized insurance company and every year we had a project to update our disaster recovery plans. Including our main inhouse datacenter going down. If it was a critical system you'd better have a plan to get it back up in like 4 hours. and those were business critical we didn't have any life critical systems.

评论 #14912994 未加载

nogbit将近 8 年前

Yes and no. By design it's not big, it just seems big. With relative RPO and RTO anyone can failover to other regions. And if you aren't leveraging multiple AZ's within a single region you need to rethink how you are using AWS.The very nature of AWS requires Amazon to build in capabilities to handle failover. But, as they say at Amazon, "everything fails, always".

smegel将近 8 年前

Is it possible for AWS to have a multi-region outage - as in is there anything connecting them that could bring them all (or several) down at once?(Apart from the result of a botched patching or update to the core software stack that was done worldwide at the same time and hopefully never happens).

评论 #14908995 未加载

评论 #14909593 未加载

评论 #14908962 未加载

评论 #14908898 未加载

评论 #14909472 未加载

评论 #14909404 未加载

评论 #14908917 未加载

评论 #14908820 未加载

评论 #14908794 未加载

jriot将近 8 年前

Nothing is too big to fail. Society needs to be able to adapt and maintain a level of patience during transition times i.e., be patient when Amazon's cloud fails to a new tool.

zeep将近 8 年前

If Amazon's cloud service would disappear today, it would be a chaos for a week or two but most people should recover (as long as they have backups).

评论 #14914142 未加载

nhumrich将近 8 年前

For articles where the headline is a question, the answer is always "no".

amerine将近 8 年前

No.<a href="https://en.wikipedia.org/wiki/Betteridge%27s_law_of_headlines" rel="nofollow">https://en.wikipedia.org/wiki/Betteridge%27s_law_of_headline...</a>