TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Scaling Kubernetes to 2,500 Nodes

265 pointsby stanzhengover 7 years ago

9 comments

SEJeffover 7 years ago
This is a really fantastic set of general "how to tune kubernetes and the various components for large clusters". Thanks for writing this up!
drewrobbover 7 years ago
I&#x27;m surprised that the scaling story of k8s&#x2F;(+etcd?) is still so far behind mesos&#x2F;zk. There have been mesos clusters at over 10k Nodes for several years now.<p>I have never personally needed more than a few hundred mesos agents, but these have been added without any noticeable impact on our extremely modestly provisioned (and multi purpose) zk cluster or any other components.<p>Has anyone used both systems and can speak to any advantages of k8s for these types of workloads?<p>Also is anyone using some kind of torrent approach as a more reasonable solution to avoid network bottlenecks when distributing big docker images to a large number of nodes?
评论 #16185400 未加载
merbover 7 years ago
what I find amazing about k8s is that it&#x27;s one of the first solution that is relativly simple for a small cluster (HA, while schedule stuff on the masters), but can scale amazingly well even for a big cluster. you can start with 3 nodes with like 8gb per machine (or less, I guess even 2gb is feasible if you only want to use like 1-1,5gb of memory per machine). (non ha can of course be smaller)
评论 #16182596 未加载
roscoebeezieover 7 years ago
As a person who doesn’t understand containers, where do I go to learn the basics?
评论 #16182032 未加载
评论 #16182071 未加载
评论 #16184231 未加载
评论 #16183629 未加载
评论 #16181502 未加载
评论 #16181599 未加载
评论 #16181819 未加载
评论 #16181991 未加载
djb_hackernewsover 7 years ago
350TB of memory, and 50,000 cores, nice.<p>ARP caching seems to be a common issue in cloud environments. AWS recommends turning it off and does so itself in their Amazon Linux distro.
myrandomcommentover 7 years ago
Ran into the ARP scale issues when trying to put 1000 containers on a system for scale testing over year ago. strace helped figure out where the issues was and what settings to change. I guess I should have sent an email to the mailing list. At that time if you searched for scaling to 1000 docker contains was a failed search, as it was &quot;hey here is how I scaled to 1000 containers over X numbers of nodes&quot;. No one was crazy enough to try to get 1000 on a single machine.
eggie5over 7 years ago
Does OpenAI train w&#x2F; GPUs on k8s clusters?
评论 #16181782 未加载
评论 #16182200 未加载
EDevilover 7 years ago
Isn’t it a problem to have etcd store its state on a non persistent volume?<p>How do they recover it after a restart? I suppose it&#x27;s not a manual process.
评论 #16185244 未加载
bdburnsover 7 years ago
(Azure containers lead here) Awesome to see OpenAI scale Kubernetes on Azure!
评论 #16180301 未加载