So at one point we where doing scale testing for our product where we needed to simulate systems running our software connected back to a central point. The idea was to run as many docker containers as we could on a server with 2x24 core and 512GB of RAM. The RAM needed for each container was very small. No matter what the system would start to break around ~1000 containers (this was 4 years ago). After doing may hours the normal debugging we did not see anything on the network stack or linux limits side that we had not already tweaked (so we thought).So out comes strace! Bingo! We found out that the system could not handle the ARP cache with so many end points. Playing with net.ipv4.neigh.default.gc_interval and the stuff associated with it got us up to 2500+ containers.
Is there any talk of increasing these defaults in higher memory systems. The low defaults feel like foot guns that people stumble into rather than something needed for optimal performance.
The big bottleneck we had with docker containers per host was not sustained peak but simultaneous start. This was with 1.6-1.8 but we’d see containers failing to start if more than 10 or so (sometimes as low as 2!) were started at the same time.<p>Hopefully rootless docker completely eliminates the races by removing the kernel resource contention.
"Access was initially fronted by nginx with consul-template generating the config. When it did not scale anymore nginx was replaced by Traefik."<p>Wonder why Nginx didn't scale.
"With /proc/sys/kernel/pid_max defaulting to 32768 we actually ran out of PIDs. We increased that limit vastly, probably way beyond what we currently need, to 500000. Actuall limit on 64bit systems is 222"<p>Time to start thinking about 128bit systems!