600k concurrent websocket connections on AWS using Node.js (2015)

357 点作者 dgelks超过 5 年前

27 条评论

sciurus超过 5 年前

A few pieces of advice based on running <a href="https://github.com/mozilla-services/autopush-rs" rel="nofollow">https://github.com/mozilla-services/autopush-rs</a>, which handles tens of millions of concurrent connections across a fleet of small EC2 instances.1) Consider not running the largest instance you need to handle your workload, but instead distributing it across smaller instances. This allows for progressive rollout to test new versions, reduces the thundering herd when you restart or replace an instance, etc.2) Don't set up security group rules that limit what addresses can connect to your websocket port. As soon as you do that connection tracking kicks in and you'll hit undocumented hard limits on the number of established connections to an instance. These limits vary based on the instance size and can easily become your bottleneck.3) Beware of ELBs. Under the hood an ELB is made of multiple load balancers and is supposed to scale out when those load balancers hit capacity. A single load balancer can only handle a certain number of concurrent connections. In my experience ELBs don't automatically scale our when that limit is reached. You need AWS support to manually do that for you. At a certain traffic level, expect support to tell you to create multiple ELBs and distribute traffic across them yourself. ALBs or NLBs may handle this better; I'm not sure. If possible design your system to distribute connections itself instead of requiring a load balancer.2 and 3 are frustrating because they happen at a layer of EC2 that you have little visibility into. The best way to avoid problems is to test everything at the expected real user load. In our case, when we were planning a change that would dramatically increase the number of clients connecting and doing real work, we first used our experimentation system to have a set of clients establish a dummy connection, then gradually ramped up that number of clients in the experiment as we worked through issues.

评论 #21224268 未加载

评论 #21224789 未加载

评论 #21224011 未加载

评论 #21224040 未加载

joaojeronimo超过 5 年前

Some dude in 2012 did 1 million active connections with whatever node.js and v8 versions we had at that time :) <a href="http://blog.caustik.com/2012/08/19/node-js-w1m-concurrent-connections/" rel="nofollow">http://blog.caustik.com/2012/08/19/node-js-w1m-concurrent-co...</a>Edit: I wasn't aware but there was some build-up to it: - 100k connections: <a href="http://blog.caustik.com/2012/04/08/scaling-node-js-to-100k-concurrent-connections/" rel="nofollow">http://blog.caustik.com/2012/04/08/scaling-node-js-to-100k-c...</a> - 250k connections: <a href="http://blog.caustik.com/2012/04/10/node-js-w250k-concurrent-connections/" rel="nofollow">http://blog.caustik.com/2012/04/10/node-js-w250k-concurrent-...</a>

mping超过 5 年前

Yeah, we used nginx-push-stream-module[1] to support 1M connections with lower boxes. Websocket-as-a-service. Really cool module. Was a realtime project for a live TV contest where people could participate with their phones.[1] <a href="https://github.com/wandenberg/nginx-push-stream-module" rel="nofollow">https://github.com/wandenberg/nginx-push-stream-module</a>

评论 #21224878 未加载

cpursley超过 5 年前

So 1/4 of what Elixir/Erlang can handle, but more difficult and less reliable:<a href="https://phoenixframework.org/blog/the-road-to-2-million-websocket-connections" rel="nofollow">https://phoenixframework.org/blog/the-road-to-2-million-webs...</a>

评论 #21224237 未加载

评论 #21227091 未加载

评论 #21224529 未加载

评论 #21223693 未加载

truth_seeker超过 5 年前

There is more performant web-socket implementation than the one mentioned in the blog. It can handle 6X more connections and much less memory<a href="https://github.com/uNetworking/uWebSockets.js" rel="nofollow">https://github.com/uNetworking/uWebSockets.js</a>EDIT:Note that the blog post is from 2015. There are many optimization (Ignition and TurboFan pipeline) has been done in V8 since then, especially offloading GC activity to separate thread than NodeJS Main thread.

评论 #21224267 未加载

desireco42超过 5 年前

There is that Phoenix thing when they famously did 2 Million.<a href="https://phoenixframework.org/blog/the-road-to-2-million-websocket-connections" rel="nofollow">https://phoenixframework.org/blog/the-road-to-2-million-webs...</a>I agree with the suggestion that smaller instances that can be scaled is not a bad idea.

anildigital超过 5 年前

I don't get point of using Node.js when compared to something like Elixir. Elixir's Phoenix can handle more numbers of concurrent connections as well as provide reliability with better programming abstractions, distribution, pretty good language.

评论 #21224282 未加载

评论 #21224226 未加载

评论 #21224112 未加载

评论 #21224220 未加载

评论 #21224126 未加载

评论 #21224149 未加载

评论 #21226743 未加载

评论 #21225015 未加载

评论 #21226197 未加载

RossM超过 5 年前

Interesting details; it would be nice to see how those ulimit/networking numbers were arrived at.The title should have [2015].

评论 #21224212 未加载

jjtheblunt超过 5 年前

Genuinely missing the point question: isn't the number of concurrent socket (websocket or otherwise) connections just a function of the underlying OS and number of instances thereof, not a function of Node.js ?

评论 #21227137 未加载

评论 #21226963 未加载

nly超过 5 年前

Doesn't sound so impressive. I've done close to a million on a single Digital Ocean droplet using nchan[0] before. Latency was reasonable even with that many connections, you just need to set your buffer sizes carefully. Handshakes are also expensive, so it's useful to to be able to control the client and build in some smart reconnect/back-off logic.[0] <a href="https://www.nginx.com/resources/wiki/modules/Nchan/" rel="nofollow">https://www.nginx.com/resources/wiki/modules/Nchan/</a>

评论 #21226907 未加载

axismundi超过 5 年前

Does anyone have a more recent experience? I currently use socket.io 2.2 with node v10.16, no v8 tweaks in a docker container. At ~1000 sockets, sometimes the server receives spikes of 8000 HTTP reqs/sec, which it has to distribute to the websockets, up to 100 msgs/sec, ~1kb/msg to each socket. These spikes are making the server unstable, socket.io switches most of the clients from websockets to xhr polling.

评论 #21226084 未加载

评论 #21226842 未加载

评论 #21230709 未加载

mcintyre1994超过 5 年前

I thought this was going to be about AWS-managed websockets using API Gateway. I've been using that at a really small scale and it's got a great API but other than almost certainly being much more expensive than the EC2 machine used here I wonder how well it works with that sort of scale.

sankha93超过 5 年前

I have written and deployed message queues in Node.js that take data from Redis and push it out on websockets. It is a pain to deal with the GC in weird cases. This was about 5 years ago, so the details might not accurate.Things worked fine until some day some customer started sending multi-megabyte strings over the system. It is difficult to actually track down that it is GC that is halting the system and then figuring out ways to fix the issue. We ended up not using JavaScript strings and instead using Node.js buffers to do the transport - I don't recall the Node.js library for Redis supporting that out of the box.

评论 #21238120 未加载

throwaway_bad超过 5 年前

Does anyone have experience doing the same on GCP?In particular right now I am trying to add live reloading to my App Engine Standard app but Standard doesn't support long lived connections (so no websockets) and App Engine Flexible seems like it will be pricy.I think I can set up a single separate websocket instance which is only responsible for doing listen/notify on postgres and telling the client when it's time to refetch from the main webserver again.Does this sound approximately workable? Will I actually be able to reach the connection numbers like in this article?

choffee超过 5 年前

I wonder how it would compare in terms of cost with doing it via the api-gateway. It would depend on how your app and user base scales and what the sockets are being used for I suppose.<a href="https://docs.aws.amazon.com/apigateway/latest/developerguide/apigateway-websocket-api-overview.html" rel="nofollow">https://docs.aws.amazon.com/apigateway/latest/developerguide...</a>

评论 #21226846 未加载

ArtWomb超过 5 年前

Lots of wisdom on this page ;)Just want to add. Real-world, often the predominant use case is not optimizing for "max-conns". But <100,000 concurrent users, who instead need to be connected for a very long time.In this instance, I've found Caddy's websocket directive, inspired by Websocketd, to be quite robust and elegant. It's just a process per conn. Handling stdin, stout style messaging ;)

m3kw9超过 5 年前

But once they all start doing stuff then what happens? 600k is more of a function of memory right?

ArchReaper超过 5 年前

Why are websites still hijacking scroll behavior in 2019? I can't even take the article seriously with my scrolling bouncing and glitching all over the place.

winrid超过 5 年前

A better solution would be to use nchan+nginx and then your Node API is just a simple stateless REST service. Will scale better and be easier to maintain.

1drr超过 5 年前

How do you test something like this internally?

评论 #21225639 未加载

verttii超过 5 年前

So what's the point of attempting to max out the count of idle connections on some cloud engine?

sehugg超过 5 年前

I did this with Netty back in 2010. Increasing nf_conntrack_max is a hard-learned lesson.

buboard超过 5 年前

Why is this specific to AWS ?

评论 #21223871 未加载

k__超过 5 年前

Does anyone know what latency overhead switching to ECS/EKS would add?

amelius超过 5 年前

It's not difficult to see that such amount of connections is only useful if your server performs a trivial task. Otherwise, the bottleneck will be your CPU(s).

t0astbread超过 5 年前

What do you need this many connections for?

评论 #21223745 未加载

评论 #21223795 未加载

评论 #21223736 未加载

The_rationalist超过 5 年前

Would be nice to rebench with the new websocketstream API!<a href="https://github.com/ricea/websocketstream-explainer/blob/master/README.md" rel="nofollow">https://github.com/ricea/websocketstream-explainer/blob/mast...</a>