TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Auto Recovery for Amazon EC2

167 pointsby tshtfover 10 years ago

13 comments

CarlHoerbergover 10 years ago
Finally, it's crazy that they've haven't implemented this earlier, and why isn't it enabled by default, like on GCE? We've had for a long time an app that just polls the ec2 api and looks for impaired instances and then automatically restarts them. We have about 2-10 impaired/scheduled-for-reboot/on-deprecated hardware-instance per month so that app is quite a time-saver.
bmurphy1976over 10 years ago
Please note that this is for EBS backed instances only.<p>If you want something similar for ephemeral instances, do what we do: min 1 max 1 auto scaling groups. We&#x27;ve found that Amazon is pretty good at catching bad instances and terminating them, although on occasion we do have to terminate an instance manually. The autoscaling group takes care of the rest.
评论 #8913950 未加载
oellegaardover 10 years ago
Heavy EC2 user here. This doesn&#x27;t solve your problems, if you want to do this right, setup an EC2 Auto Scaling group and build an image each time you need to change your server. That is the proven way most large deployments work, including Netflix.
评论 #8914892 未加载
评论 #8914684 未加载
评论 #8914785 未加载
bkeroackover 10 years ago
At the risk of being down voted, let me say that this is yet another AWS &quot;feature&quot; that is primarily a workaround for deficiencies in the platform.
评论 #8916005 未加载
评论 #8915702 未加载
biotover 10 years ago
Any reason why this isn&#x27;t automatic? From the &quot;Recover your instance&quot; docs:<p><pre><code> Examples of problems that cause system status checks to fail include: * Loss of network connectivity * Loss of system power * Software issues on the physical host * Hardware issues on the physical host </code></pre> All of these are on the physical host, which end users cannot control. So if AWS has an issue that kills your VM, if you don&#x27;t have this setup then your instance is effectively dead?
评论 #8914268 未加载
评论 #8915035 未加载
评论 #8914810 未加载
alrsover 10 years ago
The ugly caveat isn&#x27;t VPC, it&#x27;s EBS.<p>This lands on the wrong side of pets-versus-cattle. AWS has been moving towards giving people what they want, but it&#x27;s still best practice to use ephemeral storage and architect accordingly.
评论 #8914660 未加载
评论 #8914606 未加载
评论 #8914984 未加载
saryantover 10 years ago
I&#x27;ve been having a lot of issues with r3.large instances becoming unreachable lately. Hoping this can serve as a stopgap.
评论 #8913896 未加载
androver 10 years ago
I think CodeDeploy is quite an undervalued AWS tool. It&#x27;s a combination of Puppet for server config and Heroku-style deploys. Together with AutoScaling it makes it trivial to set up any number of identical servers, without relying on custom AMIs or recovery.
tedunangstover 10 years ago
Wouldn&#x27;t transparent migration to new hardware be even better? Isn&#x27;t one of the advantages of virtualization the ability to move a running image from one machine to another?
评论 #8915005 未加载
fletchownsover 10 years ago
An important note if you want to use this right away:<p><i>This feature is currently available for the C3, C4, M3, R3, and T2 instance types running in the US East (Northern Virginia) region; we plan to make it available in other regions as quickly as possible. The instances must be running within a VPC, must use EBS-backed storage, but cannot be Dedicated Instances.</i>
评论 #8914247 未加载
评论 #8914058 未加载
j-kiddover 10 years ago
This shall be a great fit for the NAT&#x2F;Bastion instance, since the high-availability setup has a few drawbacks: <a href="https://aws.amazon.com/articles/2781451301784570" rel="nofollow">https:&#x2F;&#x2F;aws.amazon.com&#x2F;articles&#x2F;2781451301784570</a>
kolevover 10 years ago
If you rely on something like this, you rely on nothing. This is like crutches for your broken architecture. For singleton roles, you could do an autoscaling group of one and do better.
评论 #8915289 未加载
halayliover 10 years ago
This makes me so happy.