The argument against this is consistency. Without a limit set, you are only guarenteed up to your request's worth of cpu, but you will often be allowed to have more. This can create a false sense of security, as your application is working fine (even though it occasionally exceeds its request). Until one day, when a neighbor happens to get thirsty, and your application suddenly breaks. Limits front-load the brokenness so that it happens immediately instead of randomly.
This advice comes from tunnel vision and makes perfect sense if you know that you have exactly two pods running at any given time. But if you have exactly two pods, then why bother use k8s? IIRC one of the major selling points of K8s was on-demand scaling or auto scaling horizontally. Which means the number of pods you have in the cluster is dynamic.<p>In the context of pods dynamically spinning up and spinning down, it's bad when a pod replica can't be allocated in the cluster "predictably" but there is nothing worse than when a new pod (new deployment) fails because "Marcus the pod" drank all the water and now I have to call DevOps and wait god knows how long before they spin up a new node to guarantee a spot for the new pod.<p>Bin-packing is a already an np-hard problem. If you remove limits from CPU then you're adding probabilities into the mix. So, for the love of god, always use limits unless you have a very specific use case.
The reason to never use CPU limits is different than those stated in the article. In short: Linux kernel SUCKS. More specifically, the "Completely Fair Scheduler" (CFS) sucks at enforcing those limits. Setting any limit at all causes CFS to waste like half of CPU cycles on enforcing it, and only the other half is available for any useful work.
Looks to me the author hasn't run different workloads in different production clusters of any complexity. Advise is fine for a small predictable cluster but too simplistic for any real complex cluster.
We use CPU limits at work for the simple reason we can't autoscale deployments without having them set. An HPA will deploy a new pod each time the CPU limit has been reached for more than 30 seconds.<p>The whole point is to scale out, not up.
I don’t agree on the
recommendation for memory « Always set your memory requests equal to your limits »<p>you can layer high priority service and low priority service better if you use some buffer.