A few pieces of advice based on running <a href="https://github.com/mozilla-services/autopush-rs" rel="nofollow">https://github.com/mozilla-services/autopush-rs</a>, which handles tens of millions of concurrent connections across a fleet of small EC2 instances.<p>1) Consider not running the largest instance you need to handle your workload, but instead distributing it across smaller instances. This allows for progressive rollout to test new versions, reduces the thundering herd when you restart or replace an instance, etc.<p>2) Don't set up security group rules that limit what addresses can connect to your websocket port. As soon as you do that connection tracking kicks in and you'll hit undocumented hard limits on the number of established connections to an instance. These limits vary based on the instance size and can easily become your bottleneck.<p>3) Beware of ELBs. Under the hood an ELB is made of multiple load balancers and is supposed to scale out when those load balancers hit capacity. A single load balancer can only handle a certain number of concurrent connections. In my experience ELBs don't automatically scale our when that limit is reached. You need AWS support to manually do that for you. At a certain traffic level, expect support to tell you to create multiple ELBs and distribute traffic across them yourself. ALBs or NLBs may handle this better; I'm not sure. If possible design your system to distribute connections itself instead of requiring a load balancer.<p>2 and 3 are frustrating because they happen at a layer of EC2 that you have little visibility into. The best way to avoid problems is to test everything at the expected real user load. In our case, when we were planning a change that would dramatically increase the number of clients connecting and doing real work, we first used our experimentation system to have a set of clients establish a dummy connection, then gradually ramped up that number of clients in the experiment as we worked through issues.