Bro, redundancy. If you want a resilient service your operations team will build in automatic redundancy. They’ll even build in auto-failover so they don’t have to wake up at 2am to fix something if a pdu fails. You don’t even have to ask for that. You just hire well and they do it because it’s the correct way to do it.<p>But if you fire the majority of your operations team and then go unplugging racks you’re going to reach a point where you’ve taken out the redundancy and the whole service falls over on its face.<p>Whatever, I shouldn’t have to explain these topics to you. A) you should know them B) if you don’t know them after high level work at Tesla, spacex and PayPal you obviously can’t learn them (and god help the people doing reliability work at those companies) and C) I don’t really care if your little empire crumbles to dust<p>Go nuts boyo, maybe the money you save on your power bill can make up that extra billion a year your ill-considered acquisition is costing the company. That’s a joke it can’t. The only hope was advertising. The people you drive away with your antics. Perhaps next time bother to learn the business model of a company you plan on acquiring before you gleefully kill the goose that lays the golden egg.
This may explain why, according to my anecdotal case, Twitter web app "Report Tweet" flows and Notifications are currently broken (for example, Report an issue modal "done" click starts acting like browsers back nav, randomly). Some Twitter features have been really wonky for the last 2 days or so.
I will never buy a Tesla. If this is his best practice approach, I am speechless. Even if you do things like that, do it in a sane, scientific manner as a test, and not to sound like a fool who plays with to many buttons as superadmin.
When I ran a datacenter in California, our company was acquired by a company in Texas. The Texans came out and started pulling that crap. I advised the customers the causes of their outages and it stopped given it impacted their rather expensive service level agreements.<p>I suppose in this case Elon can absorb any revenue losses but surely there is a better way to ease-in a chaos monkey test in a more professional and strategic manor. One could come up with a tech-refresh plan that implements a design retrofit for anything that does not have n+1 automated redundancy.
Hitting the switch tests assumptions and builds confidence. That's all great when one knows the redundancy setup. In this case one doesn't, so one shouldn't.<p>For scrappy startups where the main investor has admin access, the mean time to shut down is measured in months.
To a non-engineering manager or non-SRE, this may seem absurd.<p>We do these in a more planned way where I work and call them "fire drills".<p>Everything prod should be HA and evict-able from a particular server, rack, and/or DC floor and survive. If it's not, it's a sign something is wrong.<p>The goal isn't fun, it's to find corner cases and places that aren't HA by meatcloud, chaos monkey-style.<p>Ideally, lights-out PDUs and Ethernet switch port control can simulate "pulling plugs" in an automated way and do so remotely and more cheaply.<p>If you bought a company, maybe you'd do this once to get a baseline, but with the express intention of not doing it for fun or to be a jerk.
This is like a person unfamiliar with a large program going in and randomly removing lines of code, then trying the program. If the program still runs, obviously those lines of code were unnecessary, right?<p>Damn he's dumb. He needs a nanny.