We have big problems with this at work, in particular, an autoscaler that assumes services can use 100% of their CPU allocations. As Dan describes, this isn't true. But the autoscaler is both a cost-saving measure and a "dumb product engineers don't understand capacity planning" measure, so it can't be turned off, only downtimed for a while. For certain services, if we forget to renew the downtime, it's a guaranteed outage when we get downscaled and tail latencies degrade. Fun times.