Can you imagine if Twitter and Google went down at the same time?<p>People would be reactivating their Facebook accounts and having to sift through conspiracy theory posts about Hillary Clinton still just to figure out what was going on.<p>Edit: The points on this post keep going up and down every time I check these comments. Yes, it was sarcasm, I was joking, but I was trying to point out that most people rely on a small set of services. "Cloud" has centralized things a lot.
Could this be related to the storm?<p>I was out shoveling, and came back in to my phone blowing up. Our systems at IronMountain (formerly Fortrust) in Denver all rebooted at once. These are all on redundant power, each systems redundant power supplies connecting to different circuits entering the cabinet, and those two circuits fed from 3 PDUs (two separate, one share). Each of those is supposed to be fed by a separate UPS and generator. Last status update I had says that they are running off generators, but they've been shockingly tight-lipped about it.<p>Don't get me wrong, it was hi-LAR-ious to call into their NOC and have them pretend that I was the only one having problems. "Can you tell me if there is a major data center outage going on?" "We are trying to gather information, we are making a bunch of client phone calls, we will know after we make those calls." "... Why are you making a bunch of client calls if you aren't having an outage?"
I'm interviewing for a Production Engineer role at Facebook on Monday, thanks for providing relevant "do you have any questions for us" content.
I’ve seen many systems go down over the last few days worldwide. Aside from the possibility of a mega-DDoS attack (which Facebook denies), all of these organizations have fairly diverse tech stacks to my knowledge. Google’s issue (supposedly) had to do with their Blobstore API, we don’t know what happened with Facebook, and many other, smaller services have had issues as well, including three intranet services at my workplace.<p>This leaves me wondering what software all these places have in common. The application layers are all different, the databases are all different, the containerization and provisioning systems are different, but I imagine that all these systems rely on two things: the global Internet backbone, and maybe the Linux kernel.<p>Have there been major security vulnerabilities patched lately in the Linux kernel that could have had unintended consequences?
Facebook's own status dashboard (<a href="https://developers.facebook.com/status/dashboard/" rel="nofollow">https://developers.facebook.com/status/dashboard/</a>) showed no issues or outtage just 30 min ago.<p>I run a messenger bot platform - the webhooks stopped being delivered _hours_ ago... nothing on their status page until it had been down for hours.<p>Their current issue...<p>"We are currently experiencing issues that may cause some API requests to take longer or fail unexpectedly. We are investigating the issue and working on a resolution."<p>What? lmao
It looks like something much larger is going on. If you look at the front page of <a href="https://downdetector.com/" rel="nofollow">https://downdetector.com/</a> you'll see most major sites/backbones are having issues (Verizon/ATT/Sprint/CenturyLink/TMobile/Comcast/Level3/etc).
So yesterday Google had a major (and out of character) outage across its apps, and today Facebook has a major (and also out of character) outage across its apps.<p>I can't wait to see the RCA for both of these and if they're related.
This is bigger than Facebook.<p><a href="https://imgur.com/a/gePwi0i" rel="nofollow">https://imgur.com/a/gePwi0i</a><p><a href="https://www.akamai.com/us/en/resources/visualizing-akamai/real-time-web-monitor.jsp" rel="nofollow">https://www.akamai.com/us/en/resources/visualizing-akamai/re...</a>
Let's see whether we have a spike in the birth rate in 9 months.<p>(Oh, turns out the Great Blackout Baby Boom was a myth:<p><a href="https://www.snopes.com/fact-check/from-here-to-maternity/" rel="nofollow">https://www.snopes.com/fact-check/from-here-to-maternity/</a> )
"This usually means we're making an improvement to the database your account is stored on. While this process won't affect your account, you temporarily won't be able to access the site." <a href="https://www.facebook.com/help/134401680031995" rel="nofollow">https://www.facebook.com/help/134401680031995</a><p>I guess that this is all that I will get. Facebook is never down, it is just making improvements (like restarting the services to make them work again).
What manner of failure would cause such globally deployed and distributed systems to go down like this? I'm very interested to read up on this when they release details of the failure.
If you use their API and haven't seen it yet, their issue is listed here on their status page:<p><a href="https://developers.facebook.com/status/issues/559896447845433/" rel="nofollow">https://developers.facebook.com/status/issues/55989644784543...</a>
The real storm is realizing through Facebook OAuth you cannot access your affiliate accounts. Caution to move your accounts away from Facebook<p>Edit: Or have other methods than just relying on Facebook authentication
Serious question: Was any value lost? (this may appear sarcastic)<p>Facebook obviously loses some ad revenue and Facebook customers may lose sales. But do Facebook/Instagram users suffer? But how does losing social media for several hours affect the quality of life of users?
I've also seen issues uploading images to Whatsapp in the past half hour. I wonder if there's anything to do with the Google Cloud Storage outage that took down Gmail yesterday?
The only things that I can think of that would cause this scale of being down is either a T1 center outage or (conspiracy hat on) a major hack and everyone is rush patching<p>Would be interesting to read the post mortem if there is any regardless
<a href="https://www.facebook.com/platform/api-status/" rel="nofollow">https://www.facebook.com/platform/api-status/</a> still returns "Facebook Platform is Healthy", but you can't even load <a href="https://developers.facebook.com/status/dashboard/" rel="nofollow">https://developers.facebook.com/status/dashboard/</a>. Why have status pages if they are so susceptible to going down themselves?
So yesterday Google had a major (and out of character) outage, and today Facebook has a major (and also out of character) outage.<p>I can't wait to see the RCA for both of these and if they're related.
Instagram seems to load the feed here fine (EU), but doesn't allow you to log in from any device or post anything new. FB is totally fine if you are logged in for reading, but also can't log in if logged out.<p>VPN to US, insta can login, but still not post.<p>Distributed services are weird man!
Coincidentally, just watched The Social Network, the plot of which includes that quote by Mark:<p>> Let me tell you difference between Facebook and everybody else. We don't crash ever! If the serves are down for even a day, our entire reputation is irreversibly destroyed. <…><p>> Even a few people leaving would reverberate through the entire use base. The users are interconnected. That is the whole point. College kids are online because their friends are online, and if one domino goes, the other dominos go.
Doesn’t their world class team make such a long outage to be quite unlikely? How hard would it be to devote ample resources to a cover story for the “incident report”? Is the timing relative to the plethora of indictments relevant at all? Reasonable that this may be related to shredding of data and/or code, or even a cooperation to turn over data to government in secret deal?
Minor update: <a href="https://twitter.com/facebook/status/1105907126424109056?s=21" rel="nofollow">https://twitter.com/facebook/status/1105907126424109056?s=21</a><p>> We're focused on working to resolve the issue as soon as possible, but can confirm that the issue is <i>not related to a DDoS attack</i>.
Look at how many service providers have increased incidents reported here: <a href="https://downdetector.com/" rel="nofollow">https://downdetector.com/</a><p>My bet is that people are having problems with FB/Insta and immediately assuming that the whole internet is messed up.
> The team at Jefferies remains reasonably positive, and in the firm's top growth stock calls for the week we found four tech stocks that are offering more aggressive accounts good entry points. Carl Court / Getty Images<p>What's that weird tagline about ?
I got my github two factor auth SMS two hours late. Fortunately it was just my old laptop. I wonder if it was related. Good reminder to set up an authenticator app on my new phone so I don’t have to rely on SMS!
Whatever happens right now at Facebook is less important than the fact they will never say what affected them. Of course nobody would tell 'hey, outage right now due to 0day / mistake' but...
the issue is also in EU:
Is Facebook down? Messenger, site, app and Instagram hit by issues[1]<p>[1]<a href="https://www.manchestereveningnews.co.uk/news/uk-news/facebook-down-app-website-crash-15969732" rel="nofollow">https://www.manchestereveningnews.co.uk/news/uk-news/faceboo...</a>
First thought when I heard the news was BGP hijacking (ignoring whether accidental or deliberate). Doesn't the symptom fit other known cases like the Telegram incident in Iran last year, just at a larger scale?<p>Admittedly networking is not my strength, so perfectly happy for someone to shoot down this hypothesis.
I haven’t been able to post anything on Facebook, neither a new post to my wall nor add a comment on a friend’s post, since mid-morning US/Eastern and this is still the case. In addition I can’t login to the site - I am able to access the site only where I’m already logged-in.
This is the first time I experience this. Also note that current session on messenger.com still work, we can still send/receive message, but can't upload any image or send sticker. Looking forward for a post mortem analysis on this.
Google then Facebook and Instagram?<p>My hunch is that it's the end of Q1 and people are trying to release code changes so they can pad their Q1 performance reviews "designed and delivered feature X on time in Q1".
Perhaps relevant that npm has been having issues although they only recently caught and fixed them. Scoped Private npm packages were getting cloudflare 503 errors
Since it went down for PC and not mobile, I was concerned if it was just an idea of audience testing, in the process of moving to an app-only platform.
see also: Facebook, Instagram down: Social media sites not working for many, FB doing 'required maintenance'<p>[1] <a href="https://www.abc15.com/news/national/facebook-down-social-media-site-not-working-for-many" rel="nofollow">https://www.abc15.com/news/national/facebook-down-social-med...</a>
My fiance's uncle sent something today that because of a school shooting in Brazil, they were blocking all images and video shared to social networks like "WhatsApp, Instagram, Facebook and other social networks". I haven't been able to verify this myself or from any other sources, but I wonder if either people are misinterpreting the FB outage or if Brazil is blocking content it's having weird ripple effects.