This is a really fantastic set of general "how to tune kubernetes and the various components for large clusters". Thanks for writing this up!
I'm surprised that the scaling story of k8s/(+etcd?) is still so far behind mesos/zk. There have been mesos clusters at over 10k Nodes for several years now.<p>I have never personally needed more than a few hundred mesos agents, but these have been added without any noticeable impact on our extremely modestly provisioned (and multi purpose) zk cluster or any other components.<p>Has anyone used both systems and can speak to any advantages of k8s for these types of workloads?<p>Also is anyone using some kind of torrent approach as a more reasonable solution to avoid network bottlenecks when distributing big docker images to a large number of nodes?
what I find amazing about k8s is that it's one of the first solution that is relativly simple for a small cluster (HA, while schedule stuff on the masters), but can scale amazingly well even for a big cluster.
you can start with 3 nodes with like 8gb per machine (or less, I guess even 2gb is feasible if you only want to use like 1-1,5gb of memory per machine).
(non ha can of course be smaller)
350TB of memory, and 50,000 cores, nice.<p>ARP caching seems to be a common issue in cloud environments. AWS recommends turning it off and does so itself in their Amazon Linux distro.
Ran into the ARP scale issues when trying to put 1000 containers on a system for scale testing over year ago. strace helped figure out where the issues was and what settings to change. I guess I should have sent an email to the mailing list. At that time if you searched for scaling to 1000 docker contains was a failed search, as it was "hey here is how I scaled to 1000 containers over X numbers of nodes". No one was crazy enough to try to get 1000 on a single machine.
Isn’t it a problem to have etcd store its state on a non persistent volume?<p>How do they recover it after a restart? I suppose it's not a manual process.