Most interaction at Google between SRE and developer teams is mediated within the context of a "failure budget". For example, let's say the agreement between the engineers and the product and budget people is that the service needs to have four nines of reliability; that's the amount of computing and human power they're willing to pay for.<p>Well, that means the service is allowed to be down for four minutes every month. Let's say for the past three months, the service has actually only been out of SLA for about 30 seconds per month. That means the devs have a bit of failure budget saved up that they can work with.<p>How do you spend a failure budget? Well, let's say you're a developer and you have a new feature that you just finished writing late Thursday night, but the SREs have a rule that no code can be deployed on a Friday. If you have a lot of failure budget saved up, you have more negotiating power to get the SREs to make a special exception.<p>But let's say that this Friday deployment leads to an outage late Saturday night, and the service is down for sixteen minutes before it can be rolled back. Well, you now have a negative failure budget, and you can expect the SREs to be much more strict in the coming months about extensive unit and cluster testing, load tests, canarying, quality documentation, etc, at least until your budget becomes positive.<p>The beauty of this system is that it aligns incentives properly; without it, the devs always want to write cool new code and ship it as fast as possible, and the SREs don't ever want anything changing. But with it, the devs have an incentive to avoid shipping bad code, and the SREs have reason to trust them.