TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

AWS EC2/RDS Outage in us-east-1

212 pointsby jacobwgover 5 years ago

30 comments

matt2000over 5 years ago
Just wanted to add a quick note before we get the usual deluge of &quot;you should be running in multiple AZs and regions&quot; posts: These outages are relatively rare and your best decision might just be to accept the tiny amount of downtime and keep your app simple and inexpensive to run.<p>I of course don&#x27;t know the tradeoffs involved in running your system, but I know for a lot of my situations the simplicity of single AZ with a straightforward failover option is usually the right tradeoff.
评论 #20847393 未加载
评论 #20847319 未加载
评论 #20847285 未加载
评论 #20847280 未加载
评论 #20847357 未加载
评论 #20847675 未加载
评论 #20849378 未加载
评论 #20847951 未加载
评论 #20848291 未加载
btownover 5 years ago
<a href="https:&#x2F;&#x2F;status.heroku.com&#x2F;incidents&#x2F;1892" rel="nofollow">https:&#x2F;&#x2F;status.heroku.com&#x2F;incidents&#x2F;1892</a> - it appears Heroku is being particularly affected. We&#x27;ve had multiple sites on multiple accounts go down in the past few minutes.<p>EDIT T16:31Z: It appears Heroku has failed over their dashboard, but dynos are still failing to come online. We had assumed that they had multi-region failovers for their customers. Incredibly disappointing.
评论 #20847102 未加载
评论 #20847625 未加载
评论 #20847018 未加载
bombtrackover 5 years ago
Looks to have been caused by a loss of utility power and subsequent backup generator failure at one datacenter.<p>&gt; 10:47 AM PDT We want to give you more information on progress at this point, and what we know about the event. At 4:33 AM PDT one of 10 datacenters in one of the 6 Availability Zones in the US-EAST-1 Region saw a failure of utility power. Backup generators came online immediately, but for reasons we are still investigating, began quickly failing at around 6:00 AM PDT. This resulted in 7.5% of all instances in that Availability Zone failing by 6:10 AM PDT. Over the last few hours we have recovered most instances but still have 1.5% of the instances in that Availability Zone remaining to be recovered. Similar impact existed to EBS and we continue to recover volumes within EBS. New instance launches in this zone continue to work without issue.<p><a href="https:&#x2F;&#x2F;status.aws.amazon.com&#x2F;rss&#x2F;ec2-us-east-1.rss" rel="nofollow">https:&#x2F;&#x2F;status.aws.amazon.com&#x2F;rss&#x2F;ec2-us-east-1.rss</a>
bdcravensover 5 years ago
I&#x27;ve noticed both Twitter and Reddit were having issues this morning, so this makes sense.
评论 #20847028 未加载
评论 #20846983 未加载
scott113341over 5 years ago
I got paged 50 minutes before AWS updated their status page. We are running on AWS&#x27;s managed Kubernetes offering (EKS), and about one third of our nodes were running in the affected availability zone. We were then able to move all of or traffic out of that AZ, which solved our issues. The main symptom was HTTP requests made by our backend to 3rd party APIs failing, but only on requests originating from that AZ.
groundlogicover 5 years ago
Reddit has been quite dysfunctional for me the past hour or so.
评论 #20848731 未加载
评论 #20847213 未加载
评论 #20846790 未加载
评论 #20846813 未加载
sdrothrockover 5 years ago
Amazon JUST had an ec2&#x2F;RDS failure in one AZ in Tokyo last week; the cause was a bug in their HVAC that led to overheating. I wonder if this is similar or just coincidental.<p><a href="https:&#x2F;&#x2F;aws.amazon.com&#x2F;jp&#x2F;message&#x2F;56489&#x2F;" rel="nofollow">https:&#x2F;&#x2F;aws.amazon.com&#x2F;jp&#x2F;message&#x2F;56489&#x2F;</a>
评论 #20847295 未加载
xystover 5 years ago
The Spinnaker project is looking more appealing with every outage. Outage detected in X provider in Y region? Deploy infrastructure to Z provider in Y region.
评论 #20847063 未加载
评论 #20847047 未加载
评论 #20848654 未加载
评论 #20847024 未加载
评论 #20847056 未加载
somehowadevover 5 years ago
I’m surprised by how much of the “internet” seem to be affected by a single AZ going down.
评论 #20847118 未加载
nemothekidover 5 years ago
us-east-1 continues to have continually worse uptime than other regions (for, likely, good reason too, it continues to be the default region).<p>I&#x27;ve avoided that region and I can&#x27;t remember the last time I had downtime caused by Amazon.
评论 #20846883 未加载
评论 #20847009 未加载
JacobJansover 5 years ago
Leaseweb Virginia is having a major outage as well. Maybe it is related?<p><a href="https:&#x2F;&#x2F;www.leasewebstatus.com&#x2F;incidents&#x2F;updated-connectivity-issues-in-part-of-our-network&#x2F;ci25t2jr" rel="nofollow">https:&#x2F;&#x2F;www.leasewebstatus.com&#x2F;incidents&#x2F;updated-connectivit...</a>
ihaveajobover 5 years ago
Copy that. Happy Labor Day weekend everyone.
评论 #20846817 未加载
colinbartlettover 5 years ago
This seems to affect a broad swath of the internet, perhaps because the us-east-1 region is so popular? My side project StatusGator shows approximately 15% of the status pages we monitor (including our own) with a warn or down notice right now, a sizable spike over the baseline.
rifficover 5 years ago
&gt;We are investigating connectivity issues affecting some instances in a single Availability Zone in the US-EAST-1 Region.<p>Well there’s your problem, people. Use multiple AZs.
评论 #20846876 未加载
评论 #20846868 未加载
crb002over 5 years ago
Curious. Lambda not effected. EC2 being physically tied to a box does introduce extra risk I hadn&#x27;t thought of.
评论 #20847083 未加载
jgalt212over 5 years ago
This is pretty good common sense post on not having your failure moods correlate with your client&#x27;s failure modes.<p><a href="https:&#x2F;&#x2F;trackjs.com&#x2F;blog&#x2F;separate-monitoring&#x2F;" rel="nofollow">https:&#x2F;&#x2F;trackjs.com&#x2F;blog&#x2F;separate-monitoring&#x2F;</a><p>I don&#x27;t work for any of the entities mentioned.
abathurover 5 years ago
Had an app doing fine until about 12 minutes ago, when Heroku tried to move it to a new server. Alas.
评论 #20847959 未加载
whalesaladover 5 years ago
For folks here, my RDS instances in us-east-1f are doing okay (knock on wood!) Not sure which AZ is suffering most.<p>My client&#x27;s Heroku instances are online, thankfully.<p>Can anyone here speak to their experience with the Ohio region? I&#x27;m considering leaning on that more and more.
评论 #20848449 未加载
doiwinover 5 years ago
Is there no way at all to reach Amazon EC2 instances in us-east-1 or is just the default route to the internet broken?<p>Is there any way for the owners of the instances to reach them?
shamalingaover 5 years ago
Is this why Reddit and Duolingo weren&#x27;t working properly? I&#x27;ve had issues since 9pm Sydney time so about 4 hours or so.
karmakazeover 5 years ago
I remember reading about how not all AWS regions are similarly operated and that one was a snowflake. Is it US-East-1?
评论 #20847499 未加载
nrxrover 5 years ago
Has anyone else noticed that there seems to never be outages in us-east-2 and somehow everyone keeps putting instances in -1?<p>Why?
评论 #20847202 未加载
评论 #20847189 未加载
评论 #20847743 未加载
评论 #20847243 未加载
odirootover 5 years ago
Funnily enough Heroku in Europe also seems to be malfunctioning. Cannot deploy my app for at least an hour now.
bjornsteffansonover 5 years ago
I&#x27;m in Australia and Reddit&#x2F;Twitter ground to a standstill - request timeout after request timeout. I presumed it was an outage somewhere but was surprised to learn it was with AWS us-east-1? I would have thought surely that my connection would have referenced a different region based on my location.
评论 #20847108 未加载
patrickaljordover 5 years ago
That must be why reddit and twitter are failing on me.
评论 #20847665 未加载
holykinover 5 years ago
It looks like it was localized to zone D.
评论 #20846886 未加载
评论 #20847343 未加载
beardedmanover 5 years ago
Aha. Experienced some NPM lag too.
fibersover 5 years ago
is that why xda developers doesnt work
smitty1eover 5 years ago
My little instance died and I had to bring it back from the image.<p>Glad to know that it wasn&#x27;t anything personal over any Hacker News gags I&#x27;ve done.
rvzover 5 years ago
Well, this outage says something about the companies that religiously depend on it.<p>If your entire service just went down as soon as this happened, Congratulations! You didn&#x27;t deploy in multiple regions or think about a failsafe&#x2F;fallback option that redirects from your affected service or instance.
评论 #20846918 未加载
评论 #20847011 未加载
评论 #20846951 未加载
评论 #20847062 未加载
评论 #20847036 未加载
评论 #20846896 未加载