I see this kind of thinking all the time in hardware engineering as well, and it all boils down to premature optimization. Cost almost always is driving this.<p>One example is a recent project was a very cost-sensitive machine in which a small heater was copied over from another product, but no one actually verified that it was good to the required limits (just the default use case). Well, turns out it wasn't quite powerful enough but it is way too late and expensive now to fix it at the end! Also, all the engineering time was wasted to figure this out (but it often seems management doesn't count engineering time the same way as parts cost)!<p>I've since learned that in the beginning of a project it is critical to identify the riskiest parts of the design and try to isolate that to a module and over-spec it, hopefully with a path to reduced cost later on. But the most important thing I've learned is don't try to solve tomorrow's problems today!
I've spent quite a bit of time on a problem very similar to this. It's surprisingly challenging. Imagine this scenario:<p>Some service has three units of capacity available (e.g. VMs). This is the minimum amount allowed, on the theory that things won't break too badly if one of them happens to crash. You target 66% CPU utilization. Suddenly, one goes down, and the software sees 100% CPU utilization on the other two. What should the software do?<p>Well, the obvious thing is to add one more instance, assuming that one of them crashed and its load shifted to the other two. However, what if the thing that actually happened is that the demand doubled, and the load caused the crash? Then, you should probably add six more instances (assuming that the two remaining live ones are going to go down while those six are coming up).<p>If you look at only CPU utilization, it's impossible to tell the difference between these two situations.
This is even scarier in the physical world. Just-in-time logistics means companies aren't warehousing inventories as large as they used to. In the case of major events (natural disasters, terrorist, etc.), there isn't enough reserve supply to go around.
It’s important that systems have some design margin (buffers of one kind or another) so that a disruption / transient event in one part of the system is absorbed locally and not passed on to the rest of the system.
It seems like this problem is solved by simply setting a sensible minimum in an autoscaling group. And not "everyone on Earth was abducted by aliens and stopped using the service" levels of minimum.<p>Say I'm an e-commerce site and on Black Friday I can see historically (or just make an educated guess if it's your first holiday sale) I get "n" requests per second to my service.<p>I'll set my autoscaling group the day before to be able to handle that "n" number of requests, with the ability to grow if my expectations are exceeded. If my expectations are not met, then my autoscaling group won't shrink. Then the day after the holiday sale, I can configure my autoscaling group to have a different minimum.<p>This solves the problem of balancing between capacity planning and saving money by not having idle resources running.<p>If you're the type of person who hates human intervention for running your operation, then fine. Put in a scheduled config change every year before a sale to change your autoscaling group size.<p>It's pretty rare to have enormous spikes in application usage without good reason. Such as video-game releases, holiday sales, startup openings, viral social media campaigns.
I recently gave a talk at SRECon [1] about a partial solution: Using a PID controller. It won't solve all instances of this problem, but properly tuned, it will dampen the effect of these sudden events and quicken the response times to them.<p>[1] <a href="https://www.usenix.org/conference/srecon19emea/presentation/hahn" rel="nofollow">https://www.usenix.org/conference/srecon19emea/presentation/...</a>
> Of course, at some point, [...] the local service gets restarted by the ops team (because it can't self-heal, naturally)<p>Maybe off-topic, but what are some good strategies for the kind of "self-healing" being talked about here? If a service needs to be restarted, how could you automate the detection and restart process?
There's something related called the bullwhip effect. I <i>think</i> that throwing away requests under load rather than putting them in some overflow queue prevents it. The effects aren't magnified down the chain of services as each scales up because it's only incoming traffic.
dynamically scaling down based on cpu consumption is the wrong way to do it IMO. if your site is decently sized you have a pretty typical diurnal pattern with weekly cyclical variation, that's your baseline.
But if your service was down for more than what it takes to downscale to minimum scaling back up is not that big of an issue. It was down anyways. Also 24/7 instances exist for a reason, autoscaling is there to handle spikes, not a normal traffic.
That just means you should scale based on the work to be done rather than poor proxies such as CPU utilization. Also set a reasonable minimum and maximum based on observed load in production and review this as part of regular operational reviews.
Good edge case to consider when designing an auto scaling service, but now that I'm aware of it, I think I'll be able to design around the problem with some combo of the suggested solutions, and still get the autoscaling that I feel like the article was trying to convince me not to do...
If scaling up is painful there is something wrong with the architecture. Aside from this scenario, what if you just get a spike in traffic? If your scaling solution can't handle it, get a better one, otherwise what's the point?