Newest update: 10:29 PM PDT We can confirm a portion of a single Availability Zone in the US-EAST-1 Region lost power. We are actively restoring power to the effected EC2 instances and EBS volumes. We are continuing to see increased API errors. Customers might see increased errors trying to launch new instances in the Region.<p>Source: <a href="http://status.aws.amazon.com/?rf" rel="nofollow">http://status.aws.amazon.com/?rf</a>
Or: <a href="http://status.aws.amazon.com/rss/ec2-us-east-1.rss" rel="nofollow">http://status.aws.amazon.com/rss/ec2-us-east-1.rss</a>
We learnt our lesson the hard way after the great AWScalypse of Apr 2011.<p>The lesson: Use n>1 hosting companies (even if one of them promises a-z-multiregion-distributed-fault-tolerant-back-up)
N. Virginia in my experience is by far the least reliable region on EC2/EBS... Fortunately our app servers are across 2 zone in the region... but our db server is just a lone master... Our slave is down... Very nervous.
This is why <a href="http://AppFog.com/" rel="nofollow">http://AppFog.com/</a> is investing in multiple IaaS and is not being hit nearly as hard. You can still sign up and even create apps.
Our instances still down.
The AWS service health says:
9:27 PM PDT We continue to investigate this issue. We can confirm that there is both impact to volumes and instances in a single AZ in US-EAST-1 Region. We are also experiencing increased error rates and latencies on the EC2 APIs in the US-EAST-1 Region.
So, Amazon has said since the introduction of EC2 that, to ensure really high uptimes, customers should use multiple availability zones and architect their applications to survive an outage in a single availability zone. While I would question Amazon's competence if outages of any sort were overly frequent, Amazon has not had many at all and no recent cross-AZ ones. [This is correct, right?] I recognize that architecting applications to be performant across datacenters (tolerant of relatively high-latency replication), but Amazon seems to be a poster child for keeping its promises w.r.t. availability. Is my take on this incorrect?
I wonder if the power outage here has anything to do with this - <a href="http://www.dom.com/storm-center/dominion-electric-outage-map.jsp" rel="nofollow">http://www.dom.com/storm-center/dominion-electric-outage-map...</a>
Still down here, over 12 hours at this point. This is probably the second time we've been hit with something on AWS in the last three months -- and you have to pay them to talk to someone about it. We're definitely moving to Linode ASAP..
Still having problems staring a few instances.<p>We just started a campaign so i thought there were performance issues with our application so it took me a while to look for ec2 issues. sigh
Does anyone else find it strange that two Heroku posts made the frontpage considerably (in relative terms, obviously) earlier than "EC2 down"? I would think EC2 is a more common denominator for people, but maybe other hosts have better redundancy and thus there wasn't an immediate awareness?<p>Or am I just overly curious and it's really just that some Heroku clients happened to notice before an at-large EC2 customer?<p>edit: I don't mean to imply a conspiracy of some sort, upon a reread. I merely am curious if there are just that many Heroku users in particular on HN or somesuch?