They generally send you an advance email. I just had to migrate our Jenkins server a week or two ago because of this. I received something like 15 days notice on that one.<p>But obviously if there's a hard failure, they aren't always going to be able to give you the amount of time you'd want. Generally speaking, you should have accounted for this situation ahead of time in your engineering plans. Amazon EC2 doesn't have anything like vmotion, it's just a bunch of KVM virts.<p>If you're using the GUI, the first time you try a shutdown, it will do a normal request, but then if you go back and try it again while the first request is still pending, you should see the option for doing a hard restart. Try that and give it some time. Sometimes it takes an hour or two to get through. Otherwise, Amazon's tech support can help you.
Remember kids, an EC2 is not a server. It's a process on someone else's server and all of your data is stored in /tmp. Do plan accordingly.
Ok, the key to working with AWS EC2 instances is to remember that they are ephemeral and can disappear at any point in time. If your treating it like a traditional server that you have in a rack you're doing it wrong. Just turn it off and start a new one. You are using a configuration manager (puppet, chef, etc) aren't you?
Not only do they send you an e-mail about this, they even have an API call for it: <a href="http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-instances-status-check_sched.html" rel="nofollow">http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitorin...</a><p>Anyone who's surprised that this happens has not used EC2 very much. It is this way by design.
I think I'm missing something. Why isn't Amazon sorting this out behind the scenes so that any failing hardware is seamlessly replaced and the user is none the wiser? Am I expecting too much?
I'm working with another team of people who haven't yet tried working with cloud servers, and one of the things they're struggling with the most is that cloud servers need to be thought of as disposable. They can't easily digest the idea that servers can and will go down randomly for no known reason.<p>I think Amazon needs to put a lot more effort into educating people about the best practices involved here - creating immutable and disposable servers, make it easier (console access) to create availability groups, etc.
I've gotten one of those emails and thought, OK it's gonna reboot, not a problem for that instance, has no persistent data I care about.<p>Then it kept running, but there was no way to reboot it from EC2 console or ssh, so that was a bit of a problem, had to get support to do it.<p>Moral - reboot it yourself at a convenient time.
To work in AWS's system you must have redundant nodes -- such that any single node can be rebooted without affecting the system as a whole.<p>Notification that your system is on old hardware that has been deprecated is part of the price of doing business in this cloud system.<p>As others have noted: yes, it is a little tense (is this my production database or my Continuous Integrations machine) -- The email you get just gives you an aws-id token, so you must look it up.<p>but, AWS has enough components that help you build resilient systems that, if you've done you job correctly, you shouldn't care about these messages other than the labor of spinning up a replacement.
Reminds me of <a href="http://www.goodreads.com/quotes/379100-there-s-no-point-in-acting-surprised-about-it-all-the" rel="nofollow">http://www.goodreads.com/quotes/379100-there-s-no-point-in-a...</a>
This is somewhat unrelated, but what's the general consensus on the security of EC2 for <i>very</i> sensitive computation?<p>For example, I have a client who has some algorithms and data that are potentially quite valuable. EC2 and other AWS services would be a huge help with their project, but is there <i>any way</i> measures could be taken to ensure that no one - even Amazon employees - can get to their code and data?<p>Edit: devicenull makes some good points - I guess I had the CIA's $600 million AWS contract in my head when asking my question.
War story: I was once called in to scale an application that had been running on AWS for 6 or 7 months and was failing due to excessive traffic. Normally a good problem to have, but this turned into a difficult problem because the application stored critical data on an EBS and those are, of course, not sharable. The only solution was to move to increasingly larger instances until the application could be rewritten.
Moral: If you are on the "cloud", make sure your application design fits your infrastructure.
Once upon a time there was EC2, without EBS. It was actually a pretty good place to be. There was no ambiguity because everyone who used EC2 was given a lot of warnings about how they'd have to architect their systems to avoid critical failure. I wonder if the introduction of EBS has actually increased data loss because people aren't as paranoid about it.
Whats the point of this entry ?
Are we surprised that hardware fails ?
I am the complete opposite of an EC2 fanboy but every time they decided to shut down a machine they had the good taste of sending an email to us.