> Hope is a strategy [...] At some point we have to hope and assume. for example eventually we hope the compiler authors did a good job with the next version we are about to use, or we assume that the kernel fix was good.<p>Or you can do a very slow, controlled rollout of the new version and see what happens. With all the systems I worked with while at Google, both as a SRE and a SWE, whenever we had a new version to release, we'd update <i>one task in one cluster</i>, let it run for a day or two, check the logs and the metrics... if it was OK, update one job in one cluster, then repeat the process with an entire cluster (the designated canary cluster), and only then release to the rest of the clusters. If any of these went wrong, we'd rollback or patch, depending on the severity. We rarely missed our weekly releases.<p>I'm sure the same applies to a new compiler version or a new kernel fix. You don't need to assume anything.