I've spent quite a bit of time on a problem very similar to this. It's surprisingly challenging. Imagine this scenario:<p>Some service has three units of capacity available (e.g. VMs). This is the minimum amount allowed, on the theory that things won't break too badly if one of them happens to crash. You target 66% CPU utilization. Suddenly, one goes down, and the software sees 100% CPU utilization on the other two. What should the software do?<p>Well, the obvious thing is to add one more instance, assuming that one of them crashed and its load shifted to the other two. However, what if the thing that actually happened is that the demand doubled, and the load caused the crash? Then, you should probably add six more instances (assuming that the two remaining live ones are going to go down while those six are coming up).<p>If you look at only CPU utilization, it's impossible to tell the difference between these two situations.