TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

The Container Throttling Problem

28 点作者 r4um超过 3 年前

3 条评论

kevincox超过 3 年前
Honestly it sounds like the main problem here is the scheduler. I&#x27;m not saying you should run many massive threadpools, but at the end of the day if you have a latency-sensitive service that isn&#x27;t being given CPU for seconds at a time your scheduler isn&#x27;t suited for a latency-sensitive service.<p>Bursting is <i>good</i>. You are using resources that would otherwise be idle. It sounds here like the scheduler is punishing the task for the scheduler&#x27;s mistake. CFS is ensuring that the job gets N cores <i>on average</i> what you actually want is the scheduler to ensure that the job gets N cores <i>minimum</i>.<p>So while having too many threads laying around is slight unnecessary pressure on the scheduler and wasted memory I don&#x27;t think it should be causing these huge latency spikes.
closeparen超过 3 年前
We have big problems with this at work, in particular, an autoscaler that assumes services can use 100% of their CPU allocations. As Dan describes, this isn&#x27;t true. But the autoscaler is both a cost-saving measure and a &quot;dumb product engineers don&#x27;t understand capacity planning&quot; measure, so it can&#x27;t be turned off, only downtimed for a while. For certain services, if we forget to renew the downtime, it&#x27;s a guaranteed outage when we get downscaled and tail latencies degrade. Fun times.
Tibbes超过 3 年前
I saw this problem recently at work, with a Go program running on Kubernetes. You can work around it by setting GOMAXPROCS to the same value as cpu limit in the container spec.<p>(So be careful not to assume this problem is specific to Java, the JVM, Mesos, or Twitter&#x27;s environment)