My side project StatusGator monitors status pages (including IBM's ill-fated page) and I'm seeing more than 10% of the nearly 800 services we monitor having an outage right now.<p>So it appears to affect anyone who depends on IBM Cloud.
So what are HNers using IBM Cloud for and where do you see that it has an edge over AWS offerings (where an overlap exists, obviously)?<p>(I figure either you’re in devops and you are putting out fires too busy to read this thread or you’re not and your work is halted because of the incident so you might have time to read and reply ;)
All of Broadcastify's audio servers (hosted with Softlayer in their Dallas datacenter) are completely unreachable and down.<p>I'm going to wait a bit to see if we get a status update, otherwise we'll be spinning up instances on AWS to failover (which will be enormously costly for bandwidth)<p>No status, no nothing, we're in the dark.
I remember I was at an IBM sponsored hackathon around 2015 where it was a requirement to use Bluemix. Over the course of the weekend the service went down for hours 3 times.<p>Literally this morning I was wondering what ever happened to it, like did it die a quiet death? Oh it rebranded to IBM cloud in 2017. Now this news.<p>I think there's an eponymous law named for this sort of thing.
In the Cloud Status History page scroll down to the 6:32 entry that says "Unable to Access IBM Cloud"<p><a href="https://cloud.ibm.com/status?selected=history" rel="nofollow">https://cloud.ibm.com/status?selected=history</a><p>- 2020-06-10 02:19 UTC - RESOLVED - The network operations team adjusted routing policies to fix an issue introduced by a 3rd party provider and this resolved the incident
I generally do everything on AWS or GCP, with a little Azure sometimes for personal projects. In what world does IBM beat one of those three in anything? Generally curious - how they are able to stay competitive?
Fixed it for you <a href="https://github.com/ibm-cloud-docs/overview/pull/74" rel="nofollow">https://github.com/ibm-cloud-docs/overview/pull/74</a>
Honest slightly cynical question: most probably someone inside the responsible team said some day that it would be very stupid to host the status page inside the same infrastructure being monitored, but they were probably ignored... what should that person do now? Say "toldya!" out loud in the postmortem meeting or simply shut up and move on because reality is that we are hired to do some stupid task and not to think for ourselves?
I received communication ~15min ago that they're actively looking into the issue. I submitted the ticket roughly 20min ago. So it seems they're aware.<p>It doesn't help that their status page is also hosted on IBM Cloud.
Found this from a user on Twitter - "Our status page for IBM Aspera is on StatusPage, so you can track here as a bank shot: <a href="https://status.aspera.io" rel="nofollow">https://status.aspera.io</a> "
Seems pretty dumb to host a status page in a way that it could go down, when it should be a static page that is trivially hosted on CDN's worldwide.
The most infuriating thing about this is the ZERO communication coming out of IBM Cloud. No emails. No updates to twitter. Status page down. Support lines clogged.<p>At least give me something I can point my customers at to show them this is not due to my incompetence.
Haha, amazon had the same problem a few years ago when they had fire in datacenter, their status checker page was hosted in the same building and was showing everything is fine, while 1000s of websites hosted on AWS were down.
How sure are we that this outage is limited to IBM cloud?<p>Pindom[1] had a spike of website outages from 11k => 27k.<p>[1] <a href="https://livemap.pingdom.com/" rel="nofollow">https://livemap.pingdom.com/</a>
Ah, is this the exception that proves the rule that "no one was ever fired for buying IBM?"<p>Sorry to be glib, I'm sure it's a tough time for people who were sold on their cloud platform and work on it!
Hugops.<p>Hope they get a root cause and a quick fix. I’m not a fan of their cloud service but I know people working on the outage and fix are stressed.
About a month ago their Northern Virginia region was down. All the BGP prefixes associated with it disappeared from the internet (routes withdrawn). This time (I went to check when someone mentioned it) they kept advertising, but all traffic went nowhere once it got into their network. Curious to see if there is an RFO released.
This looks related (smoking gun?) <a href="https://status.aspera.io/incidents/t9r03x71dxkl" rel="nofollow">https://status.aspera.io/incidents/t9r03x71dxkl</a><p>>> A 3rd party network provider was advertising routes which resulted in our WW traffic becoming severely impeded.
even weather.com was down but someone broke ebay too<p><pre><code> Fastly error: unknown domain: www.ebay.com. Please check that this domain has been added to a service.</code></pre>
A quick check of cloudflare's isbgpsafeyet page<p>IBM Cloud - unsafe<p>At least AWS signs their routes I think.<p>If you can't even sign your own routes - hard to have a ton of pity.