TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Tell HN: AWS appears to be down again

863 点作者 riknox超过 3 年前
Console is flickering between "website is unavailable" and being up for my team. This is happening very frequently just now, reliability seems to have taken a hit.

74 条评论

aledalgrande超过 3 年前
If you haven&#x27;t seen yet, news is it was a power loss:<p>&gt; 5:01 AM PST We can confirm a loss of power within a single data center within a single Availability Zone (USE1-AZ4) in the US-EAST-1 Region. This is affecting availability and connectivity to EC2 instances that are part of the affected data center within the affected Availability Zone. We are also experiencing elevated RunInstance API error rates for launches within the affected Availability Zone. Connectivity and power to other data centers within the affected Availability Zone, or other Availability Zones within the US-EAST-1 Region are not affected by this issue, but we would recommend failing away from the affected Availability Zone (USE1-AZ4) if you are able to do so. We continue to work to address the issue and restore power within the affected data center.
评论 #29649502 未加载
评论 #29649090 未加载
评论 #29650225 未加载
评论 #29650239 未加载
评论 #29655392 未加载
评论 #29653700 未加载
评论 #29649191 未加载
ItsBob超过 3 年前
I&#x27;ve built out many 42U racks in DC&#x27;s in my time and there were a couple of rules that we never skipped:<p>1. Dual power in each server&#x2F;device - One PSU was powered by one outlet, the other PSU by a different one with a different source meaning that we can lose a single power supply&#x2F;circuit and nothing happens 2. Dual network (at minimum) - For the same reasons as above since the switches didn&#x27;t always have dual power in them.<p>I&#x27;ve only had a DC fail once when the engineer was performing work on the power circuitry for the DC and thought he was taking down one, but was in fact the wrong one and took both power circuits down at the same time.<p>However, a power cut (in the traditional sense where the supplier has a failure so nothing comes in over the wire) should have literally zero effect!<p>What am I missing?<p>I&#x27;ve never worked anywhere with Amazon&#x27;s budget so why are they not handling this? Is it more than just the imcoming supply being down?
评论 #29650088 未加载
评论 #29649699 未加载
评论 #29649626 未加载
评论 #29651252 未加载
评论 #29649673 未加载
评论 #29653237 未加载
评论 #29649722 未加载
评论 #29650269 未加载
Hippocrates超过 3 年前
Every time a major cloud provider has an outage, Infra people and execs cry foul and say we need to move to &lt;the other one&gt;. But does anyone really have an objective measure of how clouds stack up reliability-wise? I doubt it, since outages and their effects are nuanced. The other move is that they want to go multi-cloud... But I’ve been involved in enough multi-cloud initiatives to know how much time and effort those soak up, not to mention the overhead costs of maintaining two sets of infra sub-optimally. I would say that for most businesses, these costs far exceed that occasional six-hour-long outage.
评论 #29649487 未加载
评论 #29649476 未加载
评论 #29655363 未加载
评论 #29649633 未加载
评论 #29649434 未加载
评论 #29660363 未加载
hnarn超过 3 年前
Is there a history of AWS downtimes available somewhere? This makes what, three times in as many months?<p>edit: The question isn&#x27;t necessarily AWS specific, just any data on amount of downtime per cloud provider on a timeline would be nice.
评论 #29649130 未加载
评论 #29648678 未加载
评论 #29648408 未加载
评论 #29649898 未加载
andyjih_超过 3 年前
The most hilarious irony of not being able to acknowledge a 4AM page in the PagerDuty mobile app because AWS is down.
评论 #29649655 未加载
JCM9超过 3 年前
AWS didn’t “go down”. They had an outage in one AZ, which is why there are multiple AZs in each region. If your app went down then you should be blaming your developers on this one, not AWS. Those having issues are discovering gaps in their HA designs.<p>Obviously it’s not good for an AZ to go down but it does happen and why any production workload should be architected to have seamless failover and recover to other AZs, typically by just dropping nodes in the down AZ.<p>People commenting that servers shouldn’t go down ect don’t understand how true HA architectures work. You should expect and build for stuff to fail like this. Otherwise it’s like complaining that you lost data because a disk failed. Disks fail… build architecture where that won’t take you down.
评论 #29650403 未加载
评论 #29650522 未加载
评论 #29650586 未加载
评论 #29652117 未加载
评论 #29651602 未加载
评论 #29652075 未加载
评论 #29650340 未加载
评论 #29653211 未加载
评论 #29651126 未加载
评论 #29652525 未加载
IceWreck超过 3 年前
Honestly my server at home has more uptime than US-East-1
评论 #29649164 未加载
评论 #29648805 未加载
RONROC超过 3 年前
The prevailing wisdom throughout the last couple of years was:<p>“ditch your on-prem infrastructure and migrate to a major cloud provider”<p>And its starting to seem like it could be something like:<p>“ditch your on-prem infrastructure and spin up your own managed cloud”<p>This is probably untenable for larger orgs where convenience gets the blank check treatment, but for smaller operations that can’t realize that value at scale and are spooked by these outages, what are the alternatives?
评论 #29653250 未加载
评论 #29649400 未加载
评论 #29650302 未加载
评论 #29649259 未加载
评论 #29649336 未加载
评论 #29657698 未加载
评论 #29649270 未加载
potas超过 3 年前
Slack seems to have some issues because of that - I&#x27;m not sure if anyone is receiving messages, as it became completely silent for the last 15 minutes or so.
评论 #29648420 未加载
评论 #29649359 未加载
评论 #29648766 未加载
评论 #29648441 未加载
评论 #29648425 未加载
评论 #29648890 未加载
izietto超过 3 年前
I guess that&#x27;s why I&#x27;m experiencing weird issues with Heroku:<p><pre><code> remote: Compressing source files... done. remote: Building source: remote: remote: ! Heroku Git error, please try again shortly. remote: ! See http:&#x2F;&#x2F;status.heroku.com for current Heroku platform status. remote: ! If the problem persists, please open a ticket remote: ! on https:&#x2F;&#x2F;help.heroku.com&#x2F;tickets&#x2F;new</code></pre>
评论 #29649044 未加载
vegai_超过 3 年前
5ish years ago it was common knowledge that us-east-1 is generally the worst place to put anything that needs to be reliable. I guess this is still true?
评论 #29648887 未加载
评论 #29648782 未加载
评论 #29650508 未加载
dolibasija超过 3 年前
One of our EC2 instances in us-east-1c is unavailable and stuck in &quot;stopping&quot; state after a force stop. Interestingly enough, EC2 instances in us-east-1b don&#x27;t seem to be affected.<p>The console is throwing errors from time to time. As usual no information on AWS status page.
评论 #29648419 未加载
评论 #29648901 未加载
评论 #29648374 未加载
评论 #29648750 未加载
评论 #29648699 未加载
ClumsyPilot超过 3 年前
Now that everyone and their dog is on AWS, it is not just &#x27;a website stops working&#x27;, half the world, from telephones to security doors and Iot equipment, stops working?<p>I am not sure if the movement the cloud has reduced amount of failures, but it definitely has made these failures more catastrophic.<p>Our profession is busy makin the world less reliable and more fragile, we will have our reconning just like the shipping industry did.
评论 #29653190 未加载
评论 #29651190 未加载
schnebbau超过 3 年前
So, how many execs are going to push to move to self-managed hosting in the new year?<p>Packaging a way to migrate off AWS could be a unicorn idea.
评论 #29648876 未加载
评论 #29648623 未加载
评论 #29648962 未加载
评论 #29653313 未加载
评论 #29649526 未加载
rsp1984超过 3 年前
Bitbucket having issues too: <a href="https:&#x2F;&#x2F;bitbucket.status.atlassian.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;bitbucket.status.atlassian.com&#x2F;</a>
captn3m0超过 3 年前
4:35 AM PST We are investigating increased EC2 launched failures and networking connectivity issues for some instances in a single Availability Zone (USE1-AZ4) in the US-EAST-1 Region. Other Availability Zones within the US-EAST-1 Region are not affected by this issue.<p>via <a href="https:&#x2F;&#x2F;stop.lying.cloud&#x2F;" rel="nofollow">https:&#x2F;&#x2F;stop.lying.cloud&#x2F;</a>
评论 #29648548 未加载
评论 #29651036 未加载
mule1超过 3 年前
Feel for devops peeps who are just trying to chill for Christmas
stunt超过 3 年前
It seems that it&#x27;s due to powerloss.<p>[05:01 AM PST] We can confirm a loss of power within a single data center within a single Availability Zone (USE1-AZ4) in the US-EAST-1 Region. This is affecting availability and connectivity to EC2 instances that are part of the affected data center within the affected Availability Zone. We are also experiencing elevated RunInstance API error rates for launches within the affected Availability Zone. Connectivity and power to other data centers within the affected Availability Zone, or other Availability Zones within the US-EAST-1 Region are not affected by this issue, but we would recommend failing away from the affected Availability Zone (USE1-AZ4) if you are able to do so. We continue to work to address the issue and restore power within the affected data center.
pawelduda超过 3 年前
Bitbucket is affected, pages randomly take forever to load or return 500
评论 #29648433 未加载
评论 #29650526 未加载
darkwater超过 3 年前
Fields of green here <a href="https:&#x2F;&#x2F;status.aws.amazon.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;status.aws.amazon.com&#x2F;</a> Anyway I can access the web console with no issue (eu-west)
评论 #29648442 未加载
评论 #29648503 未加载
评论 #29648749 未加载
评论 #29648542 未加载
anshumankmr超过 3 年前
If AWS, GCP and Azure go down, we will be back in the stone ages, right?
评论 #29649012 未加载
omosubi超过 3 年前
I do wonder if the great resignation has anything to do with this. My team (no affiliation with Amazon) was cut in half from last year and we are struggling to keep up with all the work
sctgrhm超过 3 年前
Invision image uploads are down too because of this : <a href="https:&#x2F;&#x2F;status.invisionapp.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;status.invisionapp.com&#x2F;</a>
camdenreslink超过 3 年前
Who needs chaos monkey? Just host on AWS for a similar effect.
gtsop超过 3 年前
Question to the sysadmins here: Is it really that outrageous of amazon to have such issues or are people way to spoiled to appreciate the effort that goes into maintaining such a service?<p>Edit: Not supporting amazon, i generally dislike the company. I just don&#x27;t understand the extend to which the criticism is justified
评论 #29650045 未加载
rswail超过 3 年前
So why are people not migrating out of us-east-1? Operating in ap-southeast, we weren&#x27;t that affected by the us-east-1 down time, although our system is reasonably static and doesn&#x27;t make lots of IAM calls (which seems to be a large SPOF from us-east-1).
评论 #29649035 未加载
评论 #29648935 未加载
reactive55超过 3 年前
Bitbucket is down as well because of this. <a href="https:&#x2F;&#x2F;bitbucket.status.atlassian.com&#x2F;incidents&#x2F;r8kyb5w606g5" rel="nofollow">https:&#x2F;&#x2F;bitbucket.status.atlassian.com&#x2F;incidents&#x2F;r8kyb5w606g...</a>
sprite超过 3 年前
My Elastic Beanstalk instances are completely unreachable. Seems at the very least ELB is down. Looking @ down detector it looks like this is taking a bunch of sites down with it. As usual AWS status page shows all green.
exabrial超过 3 年前
As an industry, can we please stop making products like vacuums that can&#x27;t operate unless someone else&#x27;s computer is working in a field in Virgina? There&#x27;s literally no reason for it.
antihero超过 3 年前
I wonder how many 9s AWS is going for. Can&#x27;t be a lot of 9s anymore.
评论 #29649211 未加载
评论 #29649076 未加载
loudtieblahblah超过 3 年前
Yay! Adult snowday!
评论 #29649033 未加载
exogenousdata超过 3 年前
Looks like the SEC&#x27;s Edgar website is affected. This is the site the SEC uses to post the filings of public companies. Normally there are a hundred or more company filings in the morning starting at 6am ET. This morning there are two.<p><a href="https:&#x2F;&#x2F;www.sec.gov&#x2F;cgi-bin&#x2F;browse-edgar?action=getcurrent" rel="nofollow">https:&#x2F;&#x2F;www.sec.gov&#x2F;cgi-bin&#x2F;browse-edgar?action=getcurrent</a>
debarshri超过 3 年前
Hubspot seems to be down too [0].<p>[0] <a href="https:&#x2F;&#x2F;status.hubspot.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;status.hubspot.com&#x2F;</a>
amai超过 3 年前
Thank goodness we host all IT services in the same cloud. Imagine the chaos we had if everything would not fail at the same time.
iso1631超过 3 年前
Ahh, the cloud<p><a href="https:&#x2F;&#x2F;imgflip.com&#x2F;i&#x2F;5yrt24" rel="nofollow">https:&#x2F;&#x2F;imgflip.com&#x2F;i&#x2F;5yrt24</a>
lukeqsee超过 3 年前
I can&#x27;t get to the console either, receiving a &quot;Temporarily unavailable&quot; notice without branding.
sascha_sl超过 3 年前
quay.io is also dead, as well as giphy, some parts of slack<p>just the weekly internet apocalypse, happy holdidays fellow SREs
richardfey超过 3 年前
As far as I understood a whole availability zone went down; today is also the day a lot of people understand why &quot;multi-AZ&quot; matters, so I don&#x27;t think it&#x27;s fair to say that services are down because the whole AWS is down.
jakub_g超过 3 年前
Where are you located? &quot;X is down&quot; without location is only moderately useful.<p>I&#x27;m having issues with Slack from central EU (Poland) -- can&#x27;t upload images, or send emoji reactions to post; curiously, text works fine). Wondering if linked
评论 #29648387 未加载
评论 #29648618 未加载
kemals超过 3 年前
Here is The Internet Report episode on the topic of recent AWS outages that covers outage and root causes: <a href="https:&#x2F;&#x2F;youtu.be&#x2F;N68pQy8r1DI" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;N68pQy8r1DI</a>
bob1029超过 3 年前
2 of our servers are fucked right now. VOIP services down.<p>Only with AWS and Github do I seem get panicked text messages on my phone first thing in the morning... Our workloads on Azure typically only have faults when everyone is in bed.
fipar超过 3 年前
<a href="https:&#x2F;&#x2F;downdetector.com&#x2F;status&#x2F;aws-amazon-web-services&#x2F;" rel="nofollow">https:&#x2F;&#x2F;downdetector.com&#x2F;status&#x2F;aws-amazon-web-services&#x2F;</a>
devoutsalsa超过 3 年前
We&#x27;ll never really know the answer, but I have to wonder what percentage of comments on this thread are from Amazon downplaying the severity &amp; other cloud providers hyping it up.
评论 #29649448 未加载
j10c超过 3 年前
I also had problem with loading youtube at the same time(for 10-15 minutes) . It looks like a coincidence, but who knows if google uses some of the infrastructure from aws.
pkulak超过 3 年前
I used to think it was silly to have your own hardware (like a NAS) in your house. What makes you think you can do it better than AWS?<p>Santa is bringing me a Synology in three days.
评论 #29651307 未加载
RobertKerans超过 3 年前
Assuming crates.io is AWS-backed? Getting fun situation where direct dependencies of an application are downloading but then the sub-dependencies aren&#x27;t.
评论 #29648403 未加载
评论 #29649233 未加载
kingsloi超过 3 年前
Of all the AWS outage, my team and I have dodged them all, except this one. 3 instances down and unavailable<p>&gt; Due to this degradation your instance could already be unreachable<p>&gt;:(
评论 #29650677 未加载
bobviolier超过 3 年前
Seems unlogical that this is just a single region in a single US region We are having issues pulling images from public.ecr.aws from an EU region.
评论 #29649283 未加载
l0b0超过 3 年前
Meta: I posted a &quot;PyPI is down&quot; link a few days ago, and the post got insta-flagged. Is there some rule about this sort of thing?
sswaner超过 3 年前
Not down as of 7:40 EST. US-EAST-1 hosted site (athene.com). Cognito, API Gateway, Lambda, S3, DynamoDB, RDS, S3, Cloudfront.
throwaway875487超过 3 年前
Our RDS instances have completely packed up. Hell knows what&#x27;s going on. Here come the customer support tickets.
anonu超过 3 年前
Better polish off your BCP docs. People will be asking for them quite a bit more in the new year.
sprite超过 3 年前
My app running on AWS is currently down. Having intermittent problems with console as well.
评论 #29648511 未加载
评论 #29648388 未加载
streamofdigits超过 3 年前
Somebody call the IT department
allocate超过 3 年前
Also running a big production app in east-1 and we&#x27;re experiencing issues.
评论 #29648468 未加载
throwaway81523超过 3 年前
Ok, enough AWS outages to say I&#x27;m tired of hearing about low end stuff being flaky.
评论 #29649520 未加载
评论 #29648587 未加载
评论 #29649327 未加载
评论 #29648858 未加载
bognition超过 3 年前
What a way to start my day
300bps超过 3 年前
Can we please stop saying, “AWS is down”?<p>AWS consists of over 200 services offered in 86 availability zones in 26 regions each with their own availability.<p>If one service in one availability zone being impaired equals a post about “AWS is down” we might as well auto-post that every day.
评论 #29648908 未加载
评论 #29648840 未加载
评论 #29648835 未加载
评论 #29648860 未加载
biznickman超过 3 年前
Why isn&#x27;t Heroku showing a status error despite being offline?
评论 #29648613 未加载
sreitshamer超过 3 年前
Console is sluggish for me, but S3 (us-east-1) seems to work fine.
ChrisMarshallNY超过 3 年前
I can&#x27;t play Borderlands 3 this morning (Epic).<p>Wonder if it&#x27;s connected?
13daug超过 3 年前
This S3 how you gonna get you investment back from it
networkisfine超过 3 年前
Isn&#x27;t the point of the design of an availability zone having multiple data centers so that if a single data center in the availability zone fails, services aren&#x27;t affected?
评论 #29649480 未加载
Demcox超过 3 年前
Imgur is suffering from this too, I think.
amai超过 3 年前
A problem with log4j&#x2F;logshell?
whoomp12342超过 3 年前
the cloud is great they said...
tomerbd超过 3 年前
Rumble was up all this time.
reactive55超过 3 年前
Bitbucket is down as well
exabrial超过 3 年前
Stat That.
quantumfissure超过 3 年前
Me: <i>Hesitation at last job moving absolutely everything (including backups) to AWS because if it goes down it&#x27;s a problem</i> I&#x27;m a firm believer in <i>some kind of</i> physical&#x2F;easily accessible backup.<p>Coworkers: &quot;You&#x27;re an f&#x27;n idiot. Amazon and Facebook don&#x27;t go down, you&#x27;re holding us back!&quot; &lt;-Quite literally their words.<p>Me: <i>leaves cause that treatment was the final straw</i><p>Amazon and Facebook both go down within a month of each other, and supposedly they needed backups<p>Them: <i>shocked pikachu face</i>
评论 #29650041 未加载
评论 #29650945 未加载
评论 #29650001 未加载
评论 #29651002 未加载
评论 #29650331 未加载
评论 #29652115 未加载
评论 #29651244 未加载
评论 #29650855 未加载
评论 #29651486 未加载
评论 #29650243 未加载
评论 #29651292 未加载
评论 #29650440 未加载
评论 #29652508 未加载
评论 #29650105 未加载
评论 #29651404 未加载
CaptRon超过 3 年前
At least HN works.
sydthrowaway超过 3 年前
Switch to Azure
clavicat超过 3 年前
How much more frequent do these outages need to become before it starts triggering SLA limits?
sh4un超过 3 年前
Damn you all eggs in one basket.