Finally, it's crazy that they've haven't implemented this earlier, and why isn't it enabled by default, like on GCE? We've had for a long time an app that just polls the ec2 api and looks for impaired instances and then automatically restarts them. We have about 2-10 impaired/scheduled-for-reboot/on-deprecated hardware-instance per month so that app is quite a time-saver.
Please note that this is for EBS backed instances only.<p>If you want something similar for ephemeral instances, do what we do: min 1 max 1 auto scaling groups. We've found that Amazon is pretty good at catching bad instances and terminating them, although on occasion we do have to terminate an instance manually. The autoscaling group takes care of the rest.
Heavy EC2 user here. This doesn't solve your problems, if you want to do this right, setup an EC2 Auto Scaling group and build an image each time you need to change your server. That is the proven way most large deployments work, including Netflix.
At the risk of being down voted, let me say that this is yet another AWS "feature" that is primarily a workaround for deficiencies in the platform.
Any reason why this isn't automatic? From the "Recover your instance" docs:<p><pre><code> Examples of problems that cause system status checks to
fail include:
* Loss of network connectivity
* Loss of system power
* Software issues on the physical host
* Hardware issues on the physical host
</code></pre>
All of these are on the physical host, which end users cannot control. So if AWS has an issue that kills your VM, if you don't have this setup then your instance is effectively dead?
The ugly caveat isn't VPC, it's EBS.<p>This lands on the wrong side of pets-versus-cattle. AWS has been moving towards giving people what they want, but it's still best practice to use ephemeral storage and architect accordingly.
I think CodeDeploy is quite an undervalued AWS tool. It's a combination of Puppet for server config and Heroku-style deploys. Together with AutoScaling it makes it trivial to set up any number of identical servers, without relying on custom AMIs or recovery.
Wouldn't transparent migration to new hardware be even better? Isn't one of the advantages of virtualization the ability to move a running image from one machine to another?
An important note if you want to use this right away:<p><i>This feature is currently available for the C3, C4, M3, R3, and T2 instance types running in the US East (Northern Virginia) region; we plan to make it available in other regions as quickly as possible. The instances must be running within a VPC, must use EBS-backed storage, but cannot be Dedicated Instances.</i>
This shall be a great fit for the NAT/Bastion instance, since the high-availability setup has a few drawbacks: <a href="https://aws.amazon.com/articles/2781451301784570" rel="nofollow">https://aws.amazon.com/articles/2781451301784570</a>
If you rely on something like this, you rely on nothing. This is like crutches for your broken architecture. For singleton roles, you could do an autoscaling group of one and do better.