TechEcho

13 comments

c2over 14 years ago

Sounds like the AWS architecture caused Netflix to write better code ( read: more durable, more fault tolerant ). Less assumptions baked in the code, and it will be easier to port it to a new data center/cloud architecture if AWS doesn't meet their needs.As Netflix continues to scale, these changes will make managing that growth much easier.A lot of you seem to take this post as being negative against AWS architecture. I take it more as a good collection of common things that you need to watch out for in distributed environments, specifically the dangers of assumptions within your current infrastructure which may change dramatically as you scale.

jemfinchover 14 years ago

Their "Chaos Monkey" approach reminds me of an excellent paper on "Crash Only Software": <a href="http://goo.gl/dqDII" rel="nofollow">http://goo.gl/dqDII</a>The best way to test the uncommon case is to make it more common.

评论 #2013719 未加载

评论 #2013707 未加载

评论 #2013699 未加载

briandollover 14 years ago

This reads like the 'fallacies of distributed computing' paper (<a href="http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing" rel="nofollow">http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Comput...</a>).While the likelihood of failure (or added latency, impacting upstream changes, etc.) is greater in large-scale distributed environments for which you do not control vs. your home-grown datacenter, those scenarios are just facts of life in distributed environments.An awesome side effect of hosting an app in a cloud environment is that you must face up to those fallacies immediately or they'll eat you alive.

评论 #2013961 未加载

aristusover 14 years ago

I'm pretty sure "session-based memory management" should be "memory-based session management", ie they kept user session state in memory.

评论 #2013317 未加载

wccrawfordover 14 years ago

I want a Chaos Monkey, too!Actually, that was my first reaction, but after thinking for a moment, that isn't really a reliable way to test. If you make changes to something, you don't know for sure if the chaos monkey hit while you were testing a certain thing or not. Proper unit tests would seem to be a lot more useful.

评论 #2013304 未加载

评论 #2013325 未加载

评论 #2013072 未加载

评论 #2013068 未加载

Jdover 14 years ago

Basically the gist is: You need to be prepared for anything to stop working at any time.The tone of this post indicates to me that the criticism and problems experienced by Netflix with AWS are understated, which I can understand given their position as a flagship AWS customer, etc.

评论 #2013343 未加载

评论 #2015946 未加载

wglbover 14 years ago

Interesting: The Chaos Monkey’s job is to randomly kill instancesAnother way to say "If it ain't tested, it's broken".

kondroover 14 years ago

Hardware is always going to fail eventually. Moving to AWS caused NetFlix to write better code to deal with these failures.Failures were always going to happen, even in their own datacentre. What they have now is a more fault-tolerant system which should have less downtime overall.

byteclubover 14 years ago

If you do decide to adopt your very own pet Chaos Monkey in your next project, make sure you ARE able to gracefully degrade your service in case of failures. Otherwise your customers will see the monkey in action, manifested by "we'll be back shortly" messages. It's easier said than done, since a lot of the time all of us forget to write (or feel lazy, or have no idea how to properly handle) the "else" statements in case of errors/unavailable services/unreachable databases.Otherwise, good idea. It forces you to think about the perils of distributed environment from the very beginning, as opposed to leaving it to be an afterthought.

dochtmanover 14 years ago

Lesson they have not yet learned: including a HTML title tag in their Blogger templates.

jasonkesterover 14 years ago

I love the idea of setting up a fully working system on AWS, then repeating all traffic from your live site over to it to see how it stands up under load.No need to simulate traffic for testing purposes. Here's our actual traffic. All of it.Nice.

ergo98over 14 years ago

Reading both this entry and the one that explained why they went with AWS, I'm left confused about why they ever went to AWS in the first place.

评论 #2013333 未加载

评论 #2013111 未加载

评论 #2013192 未加载

评论 #2013592 未加载

评论 #2013664 未加载

sabatover 14 years ago

I'll bet other companies (e.g. Heroku, Dropbox) that use AWS/EC2 would have similar things to say.I did have this one question, being a guy with an IT background: they expected stability? Really? I always expect host/app/system failure, and am pleasantly surprised when it doesn't happen.

评论 #2013483 未加载

评论 #2020185 未加载

13 comments

c2over 14 years ago

jemfinchover 14 years ago

评论 #2013719 未加载

评论 #2013707 未加载

评论 #2013699 未加载

briandollover 14 years ago

评论 #2013961 未加载

aristusover 14 years ago

I'm pretty sure "session-based memory management" should be "memory-based session management", ie they kept user session state in memory.

评论 #2013317 未加载

wccrawfordover 14 years ago

评论 #2013304 未加载

评论 #2013325 未加载

评论 #2013072 未加载

评论 #2013068 未加载

Jdover 14 years ago

评论 #2013343 未加载

评论 #2015946 未加载

wglbover 14 years ago

Interesting: The Chaos Monkey’s job is to randomly kill instancesAnother way to say "If it ain't tested, it's broken".

kondroover 14 years ago

byteclubover 14 years ago

dochtmanover 14 years ago

Lesson they have not yet learned: including a HTML title tag in their Blogger templates.

jasonkesterover 14 years ago

ergo98over 14 years ago

Reading both this entry and the one that explained why they went with AWS, I'm left confused about why they ever went to AWS in the first place.

Netflix: Lessons We’ve Learned Using AWS

13 comments

Netflix: Lessons We’ve Learned Using AWS

13 comments