In my experience, HPA are awesome! Once you defined your sweetspot of buffer pods for quick scaling, they are well worth the effort!<p>It the super simple stuff, scaling down staging on the weekend or even scaling all feature deployments to 0, when you know nobody will be working on it, that will end up saving you big bucks on your cloud budget.<p>If you pair the HPA with a decent node autoscaler, THAT in my opinion is the game changer of cloud managed kubernetes over the bare metal deployments that I have done.
HPAs can definitely save you a lot of money when running Kubernetes and they are extremely useful, especially for non-production environments where you want to be efficient as possible.<p>Strategies I have used in the past for saving money are:<p><pre><code> 1) Set requests very low for your pods. Look at the minimum CPU/Memory that your pods need go start and set it to that. Limits can be whatever.
2) Set min replicas to 1. This is a non-production environment, nobody cares if an idle pod goes away in the middle of the night.
3) Use spot instances for your cluster nodes. 80% savings is nice!
4) Increase the number of allowed pods per node. GKE sets the default to 110 pods per node but it can be increased.
5) Evaluate your nodes and determine if it makes more sense to have `fewer large sized nodes` or `several smaller nodes`. If you have a lot of daemonsets then maybe it makes sense to have fewer large nodes.
6) Look at the CPU and Memory utilization of your nodes. Are you using a lot of CPU but not much memory? Maybe you need to change the machine type you are using so that you get close(r) to 100% CPU and Memory utilization. You are just wasting money if you are only using 50% of the available memory of your nodes.
7) Use something like knative or KEDA for 'intelligent autoscaling'. I've used both extensively and I found KEDA to be considerably simpler to use. Being able to scale services down to 0 pods is extremely nice!</code></pre>
Highly recommend checking out KEDA (<a href="https://keda.sh/" rel="nofollow">https://keda.sh/</a>) which leverages HPAs under the hood.<p>If you need to scale based on some internal data like database records, Redis queues, Kafka topics, etc. KEDA scalers are incredibly easy to hook up to do that. You could even write your own custom scaler if there is no existing one for your type of event data source.
If the author is here: The illustration in "How does Horizontal Pod Autoscaler work?" section has incorrect before/after CPU utilization % based on the text/logic.
I'm curious if anyone has found a sweet spot for autoscaling ingress gateways in terms of CPU% saturation. I found tail latencies start to get high over 60%.