If you use Terminal on top of AWS (one deployment option) we can just migrate your workloads without rebooting.<p>The way it works is that you read the RAM pages from one machine to another in real time and when the RAM cache is almost synchronized you slam the IP address over to the new box (and then you let Amazon reboot your old box and then migrate back post-upgrade if you want to).<p>You can try it out on our public cloud at terminal.com if you'd like to (we auto-migrate all of our customers off of the degrading hardware before it reboots on our public cloud, but you can control that if you're running terminal as your infrastructure).
It's a bit odd that they don't stop launching new VMs on the old hardware. That would allow people who wanted to control the transition to just stop and start their VMs.
Been there, done that. AWS re:Boot in September 2014 showed us how good it was to invest in Ansible roles for all parts of our infrastructure. Still, a lot of hassle for Ops Team, especially that it was done during DevOps Days Warsaw ;-) AWS also said '10%' then, but for us it was 81 out of ~300 instances.<p>What is sad is that we learn about it from Hacker News and not from AWS, even when we have premium support and our own account manager. :/<p>Let's see how many of us did their homework after previous "xen update", and how much "10%" is now ;-)
Linode forced a reboot for us last night also. They did not disclose why, for some reason, even though I pointedly asked. Downtime was ~20 minutes.<p>These must be some seriously bad mojo to force reboots with little to no notice over a week before they're scheduled to leave embargo.
Related: Five new undisclosed Xen vulnerabilities (xen.org) <a href="https://news.ycombinator.com/item?id=9116937" rel="nofollow">https://news.ycombinator.com/item?id=9116937</a>
We contacted SoftLayer about this issue, they literally had not heard anything about it and they would "contact their datacenter team".<p>If they treat it like the last round of Xen vulnerabilities, they will simply place a warning on their dashboard an hour beforehand - not sending out any form of email notice. The first we knew about it was when we started receiving alerts from nagios.
Rackspace notice regarding the same patch:<p><a href="https://community.rackspace.com/general/f/53/t/4978" rel="nofollow">https://community.rackspace.com/general/f/53/t/4978</a><p>I wasn't able to find anything on Digital Ocean's public facing websites.
Anybody knows what this 10% mean? I mean :<p>a) only 10% of the fleet are running a version of the hypervisor that is affected by the bug<p>b) based on the turnover rate, they expect 10% to need rebooting under the customers by the date the bugs are being released.<p>c) 10% are running a combination of the affected hypervisor and vm's that are reasonably at risk of exploitation, other's may have the faulty hypervisor but either are being used as single tenant (there is no risk of someone breaking out and affecting someone else) or are running vm's that may not be able to break out depending on the nature of bugs.<p>Just speculating, any ideas?