When things break in production my first instinct is to find and revert what changed. How are folks tracking all the changes made in production? There's deploys, experiment changes, feature flags, config changes, custom scripts and more to keep track of.
Not me personally but the last place I worked had change managers. Jiras had to be tagged, reviewed, risk ranked and approved even before being considered for the change management meetings. Every change would be scrutinized in the meeting and re-risk-ranked or declined if need be. The final changes would be in a confluence/jira dashboard and would be operated in specific orders during the change windows. This process cut our unplanned outages down considerably and kept the change windows smaller and customers happier.<p>As for reverting changes, the lead engineers and the change manager would sort that out. There would be RCA's Root Cause Analysis to determine what went wrong. No finger pointing, just preventing future flubs.<p>It was rough at first but once people got into the groove changes moved along much smoother and people were able to get some sleep.