Before the doom and gloomers come out, this is the first time since leaving beta I can remember it happening.<p>We left AWS about 18 months ago after one of the outages and switched to GAE. I've counted 3-4 big downtimes for AWS compared to this one on GAE. That's still a good decision (for now)....
I think this is larger than just GAE.<p><a href="http://internettrafficreport.com/namerica.htm" rel="nofollow">http://internettrafficreport.com/namerica.htm</a><p>It seems like large portions of the internet are down.
It's time we remembered the whole strength of the internet was that it was distributed and we avoided introducing single points of failure. We have ended up using vast amounts of infrastructure for no reason other than developer convenience (often with respect to security), when having local direct connections is often more suitable than shooting everything into the cloud.
> "App Engine is currently experiencing serving issues. The team is actively working on restoring the service to full strength. Please follow this thread for updates."<p>-- Max Ross (Google) maxr@google.com via googlegroups.com<p><a href="https://groups.google.com/forum/?fromgroups=#!topic/google-appengine-downtime-notify/SMd2pDJsCPo" rel="nofollow">https://groups.google.com/forum/?fromgroups=#!topic/google-a...</a>
And they've sent the all-clear:<p>At this point, we have stabilized service to App Engine applications. App Engine is now successfully serving at our normal daily traffic level, and we are closely monitoring the situation and working to prevent recurrence of this incident.<p>This morning around 7:30AM US/Pacific time, a large percentage of App Engine’s load balancing infrastructure began failing. As the system recovered, individual jobs became overloaded with backed-up traffic, resulting in cascading failures. Affected applications experienced increased latencies and error rates. Once we confirmed this cycle, we temporarily shut down all traffic and then slowly ramped it back up to avoid overloading the load balancing infrastructure as it recovered. This restored normal serving behavior for all applications.<p>We’ll be posting a more detailed analysis of this incident once we have fully investigated and analyzed the root cause.<p>Regards,<p>Christina Ilvento on behalf of the Google App Engine Team<p><a href="https://groups.google.com/forum/#!topic/google-appengine-downtime-notify/SMd2pDJsCPo/discussion" rel="nofollow">https://groups.google.com/forum/#!topic/google-appengine-dow...</a>
Meanwhile... Gmail etc are working quite fine. So the claim that if you build on GAE you "take advantage of the same infrastructure used for Google services!!" starts to ring a bit hollow.
I'm seeing a bunch of Google properties also. Maybe they are running on app engine? Like <a href="https://developers.google.com/" rel="nofollow">https://developers.google.com/</a>
At about 7:30am US/Pacific time this morning, Google began experiencing slow performance and dropped connections from one of the components of App Engine. Many App Engine applications are experiencing slow responses and an inability to connect to services. We currently show that a majority of App Engine users and services are affected. We are actively working on restoring service as quickly as possible.<p>We are posting regular updates to our downtime-notify list here: <a href="https://groups.google.com/forum/?fromgroups=#!topic/google-appengine-downtime-notify/SMd2pDJsCPo" rel="nofollow">https://groups.google.com/forum/?fromgroups=#!topic/google-a...</a><p>Thanks,
Christina, Google App Engine Product Manager
What's the earliest sign of trouble you've had?<p>Pingdom reports my GAE-hosted site has been down since 2012-10-26 10:37:38 EST, a bit over an hour now.<p>UPDATE: My site is back. Delayed report from Pingdom says site came back online after 50 minutes. Performance is sketchy still. We're probably not in the clear yet.<p>At least we can now get to the status dash:<p><a href="http://code.google.com/status/appengine" rel="nofollow">http://code.google.com/status/appengine</a>
It's really quite remarkable (to be honest, inexcusable is probably a better word) that their status page is failing as well. My expectations for a company with Google's resources and infrastructure are a lot higher than that.<p>Nothing on their Twitter account either: <a href="https://twitter.com/app_engine" rel="nofollow">https://twitter.com/app_engine</a><p>A poor handling of a systems failure in my opinion.
Latest update:<p>"At approximately 7:30am Pacific time this morning, Google began experiencing slow performance and dropped connections from one of the components of App Engine. The symptoms that service users would experience include slow response and an inability to connect to services. We currently show that a majority of App Engine users and services are affected. Google engineering teams are investigating a number of options for restoring service as quickly as possible, and we will provide another update as information changes, or within 60 minutes."<p><a href="https://groups.google.com/forum/?fromgroups=#!topic/google-appengine-downtime-notify/SMd2pDJsCPo" rel="nofollow">https://groups.google.com/forum/?fromgroups=#!topic/google-a...</a>
Status reports from the mailing list <a href="https://groups.google.com/d/topic/google-appengine-downtime-notify/SMd2pDJsCPo/discussion" rel="nofollow">https://groups.google.com/d/topic/google-appengine-downtime-...</a>
I'm really happy I don't host in the cloud. How quickly are the cost savings of cloud computing obliterated by PR, customer service, and system administration time when an outage like this occurs?
Dropbox [0] is showing a 500 for me to. I've very confused as to what has just happened to the internet...<p>[0] <a href="https://www.dropbox.com/" rel="nofollow">https://www.dropbox.com/</a>
My Google contact said that 'SRE are all over it. Hope to have more details soon.' but that was about 30 minutes ago.<p>Does tumblr.com use app engine? They're down...
Hm, bad week for the Cloud. Can't even get to the status page; hopefully it's not hosted on App Engine.<p>So going forward, what's the best way to protect against cloud downtime? Have a hot/standby failover with a different provider? Prepare customers' expectations for the possibility of server outages? Do a ton of research, pay $$$ for lots of nines uptime, and lambast the host when they don't deliver?
I would love it so much to see people at google showing all the internal tools they're using to detect and solve this kind of issues. I can only imagine a war room with screens all over the place showing gigantic amount of red flashing lines :)
Hope it doesn't last for long though, i was just praising what a good choice app engine has been so far 10 minutes ago...