"Operational Safety" is the neglected child of software operations. I saw how it was implemented effectively when working at AWS, but the broader software ecosystem appeared oblivious to this key concept. While the CrowdStrike outage caused havoc, its silver lining is that Operational Safety has now become a key consideration for software leaders, all the way to CIOs.
It must stay this way as complex, mission-critical systems will continue to rely more and more on software and cascaded failures are just a fact of life in these systems.