How Discord Scaled Elixir to 5M Concurrent Users

802 pointsby b1naryth1efalmost 8 years ago

40 comments

iagooaralmost 8 years ago

This writeup make me even more convinced of Elixir becoming one of the large players when it comes to hugely scaling applications.If there is one thing I truly love about Elixir, it is the easiness of getting started, while standing on the shoulders of a giant that is the Erlang VM. You can start by building a simple, not very demanding application with it, yet once you hit a large scale, there is plenty of battle-proven tools to save you massive headaches and costly rewrites.Still, I feel, that using Elixir is, today, still a large bet. You need to convince your colleagues as much as your bosses / customers to take the risk. But you can rest assured it will not fail you as you need to push it to the next level.Nothing comes for free, and at the right scale, even the Erlang VM is not a silver bullet and will require your engineering team to invest their talent, time and effort to fine tune it. Yet, once you dig deep enough into it, you'll find plenty of ways to solve your problem at a lower cost as compared to other solutions.I see a bright future for Elixir, and a breath of fresh air for Erlang. It's such a great time to be alive!

评论 #14748774 未加载

评论 #14748723 未加载

评论 #14750126 未加载

评论 #14749665 未加载

评论 #14748705 未加载

评论 #14748903 未加载

评论 #14751288 未加载

评论 #14750527 未加载

jakebasilealmost 8 years ago

I'm continually impressed with Discord and their technical blogs contribute to my respect for them. I use it in both my personal life (I run a small server for online friends, plus large game centric servers) and my professional life (instead of Slack). It's a delight to use, the voice chat is extremely high quality, text chat is fast and searchable, and notifications actually work. Discord has become the de facto place for many gaming communities to organize which is a big deal considering how discriminating and exacting PC gamers can be.My only concern is their long term viability and I don't just mean money wise. I'm concerned they'll have to sacrifice the user experience to either achieve sustainability or consent to a buyout by a larger company that only wants the users and brand. I hope I'm wrong, and I bought a year of Nitro to do my part.

评论 #14748608 未加载

评论 #14748695 未加载

评论 #14748761 未加载

评论 #14748765 未加载

Cieplakalmost 8 years ago

I know that the JVM is a modern marvel of software engineering, so I'm always surprised when my Erlang apps consume less than 10MB of RAM, start up nearly instantaneously, respond to HTTP requests in less than 10ms and run forever, while my Java apps take 2 minutes to start up, have several hundred millisecond HTTP response latency and horde memory. Granted, it's more an issue with Spring than with Java, and Parallel Universe's Quasar is basically OTP for Java, so I know logically that Java is basically a superset of Erlang at this point, but perhaps there's an element of "less is more" going on here.Also, we're looking for Erlang folks with payments experience.cGF0cmljaytobkBmaW5peHBheW1lbnRzLmNvbQ==

评论 #14750221 未加载

评论 #14748895 未加载

评论 #14749557 未加载

评论 #14749541 未加载

评论 #14752020 未加载

评论 #14749126 未加载

评论 #14748575 未加载

评论 #14751306 未加载

评论 #14749572 未加载

rdtscalmost 8 years ago

Good stuff. Erlang VM FTW!> mochiglobal, a module that exploits a feature of the VM: if Erlang sees a function that always returns the same constant data, it puts that data into a read-only shared heap that processes can access without copying the dataThere is a nice new OTP 20.0 optimization - now the value doesn't get copied even on message sends on the local node.Jesper L. Andersen (jlouis) talked about it in his blog: <a href="https://medium.com/@jlouis666/an-erlang-otp-20-0-optimization-efde8b20cba7" rel="nofollow">https://medium.com/@jlouis666/an-erlang-otp-20-0-optimizatio...</a>> After some research we stumbled upon :ets.update_counter/4Might not help in this case but 20.0 adds select_replace so can do a full on CAS (compare and exchange) pattern <a href="http://erlang.org/doc/man/ets.html#select_replace-2" rel="nofollow">http://erlang.org/doc/man/ets.html#select_replace-2</a> . So something like acquiring a lock would be much easier to do.> We found that the wall clock time of a single send/2 call could range from 30μs to 70us due to Erlang de-scheduling the calling process.There are few tricks the VM uses there and it's pretty configurable.For example sending to a process with a long message queue will add a bit of a backpressure to the sender and un-schedule them.There are tons of configuration settings for the scheduler. There is to bind scheduler to physical cores to reduce the chance of scheduler threads jumping around between cores: <a href="http://erlang.org/doc/man/erl.html#+sbt" rel="nofollow">http://erlang.org/doc/man/erl.html#+sbt</a> Sometimes it helps sometimes it doesn't.Another general trick is to build the VM with the lcnt feature. This will add performance counters for locks / semaphores in the VM. So then can check for the hotspots and know where to optimize:<a href="http://erlang.org/doc/man/lcnt.html" rel="nofollow">http://erlang.org/doc/man/lcnt.html</a>

评论 #14749273 未加载

评论 #14748657 未加载

评论 #14749068 未加载

mbestoalmost 8 years ago

This is one of those few instances where getting the technology choice right actually has an impact on cost of operations, service reliability, and overall experience of a product. For like 80% of all the other cases, it doesn't matter what you use as long as your devs are comfortable with it.

评论 #14748689 未加载

jlouisalmost 8 years ago

A fun idea is to do away with the "guild" servers in the architecture and simply run message passes from the websocket process over the Manifold system. A little bit of ETS work should make this doable and now an eager sending process is paying for the work itself, slowing it down. This is exactly the behavior you want. If you are bit more sinister you also format most of the message in the sending process and makes it into a binary. This ensures data is passed by reference and not copied in the system. It ought to bring message sends down to about funcall overhead if done right.It is probably not a solution for current Discord as they rely on linearizability, but I toyed with building an IRCd in Erlang years ago, and there we managed to avoid having a process per channel in the system via the above trick.As for the "hoops you have to jump through", it is usually true in any language. When a system experiences pressure, how easy it is to deal with that pressure is usually what matters. Other languages are "phase shifts" and while certain things become simpler in that language, other things become much harder to pull off.

评论 #14749285 未加载

dansoalmost 8 years ago

According to Wikipedia, Discord's initial release was March 2015. Elixir hit 1.0 in September 2014 [0]. That's impressively early for adoption of a language for prototyping and for production.[0] <a href="https://github.com/elixir-lang/elixir/releases/tag/v1.0.0" rel="nofollow">https://github.com/elixir-lang/elixir/releases/tag/v1.0.0</a>

评论 #14748997 未加载

didibusalmost 8 years ago

So, at this point, every language was scaled to very high concurrent loads. What does that tell us? Sounds to me like languages don't matter for scale. In fact, that makes sense, scale is all about parallel processes, horizontally distributing work can be achieved in all language. Scale is not like perforance, where if you need it, you are restricted to a few languages only.That's why I'd like to hear more about productivity and ease now. Is it faster and more fun to scale things in certain languages then others. Beam is modeled on actors, and offer no alternatives. Java offers all sorts of models, including actors, but if actors are the currently most fun and procudctive way to scale, that doesn't matter.Anyways, learning how team scaled is interesting, but it's clear to me now languages aren't limiting factors to scale.

评论 #14749510 未加载

评论 #14748950 未加载

评论 #14749699 未加载

评论 #14750311 未加载

评论 #14748963 未加载

jmcgoughalmost 8 years ago

Great to see more posts like this promoting Elixir. I've been really enjoying the language and how much power it gets from BEAM.Hopefully more companies see success stories like this and take the plunge - I'm working on an Elixir project right now at my startup and am loving it.

ShaneWiltonalmost 8 years ago

Thanks for putting this writeup together! I use Elixir and Erlang every day at work, and the Discord blog has been incredibly useful in terms of pointing me towards the right tooling when I run into a weird performance bottleneck.FastGlobal in particular looks like it nicely solves a problem I've manually had to work around in the past. I'll probably be pulling that into our codebase soon.

评论 #14749083 未加载

joonoroalmost 8 years ago

Elixir was one of the reasons I started using Discord in the first place. I figured if they were smart enough to use Elixir for a program like this then they would probably have a bright future ahead of them.In practice, Discord hasn't been completely reliable for my group. Lately messages have been dropping out or being sent multiple times. Voice gets messed up (robot voice) at least a couple times per week and we have to switch servers to make it work again. A few times a person's voice connection has stopped working completely for several minutes and there's nothing we can do about it.I don't know if these problems have anything to do with the Elixir backend or the server.EDIT: Grammar

评论 #14748673 未加载

ConanRusalmost 8 years ago

I do not see there any Elixir specific, it is all basically Erlang/Erlang VM/OTP stuff. When you using Erlang, you think in terms of actors/processes and message passing, and this is (IMHO) a natural way of thinking about distributed systems. So this article is a perfect example how simple solutions can solve scalability issues if you're using right platform for that.

评论 #14748874 未加载

majidazimialmost 8 years ago

It seems awkward to me. What if Erlang/OTP team can not guarantee message serialization compatibility across a major release? How you are going to upgrade a cluster one node at a time? What if you want to communicate with other platforms? How you are going to modify distribution protocol on a running cluster without downtime?As soon as you introduce standard message format, then all nice features such as built-in distribution, automatic reconnect, ... are almost useless. You have to do all these manually. May be I'm missing something. Correct me if I'm wrong.For a fast time to market it seems quite nice approach. But for a long running maintainable back-end it not enough.

评论 #14752024 未加载

评论 #14750765 未加载

评论 #14750673 未加载

_ar7almost 8 years ago

Really liked the blog post. Elixir and the capabilities of the BEAM VM seems really awesome, but I can't really find an excuse to really use them in my day to day anywhere.

StreamBrightalmost 8 years ago

Whatsapp's story is somewhat similar. Relevant read to this subject.<a href="http://www.erlang-factory.com/upload/presentations/558/efsf2012-whatsapp-scaling.pdf" rel="nofollow">http://www.erlang-factory.com/upload/presentations/558/efsf2...</a>

brian_hermanalmost 8 years ago

I love discord's posts they are very informative and easy to read.

OOPManalmost 8 years ago

5 million concurrent users is great and all, but it would be nice if Discord could work out how to use WebSockets without duplicating sent messages.This seems to happen a lot when you are switching between wireless networks (E.g. My home router has 2Ghz and 5Ghz wireless networks) or when you're on mobile (Seems to happen regularly, even if you're not moving around).It's terribly annoying though and makes using the app via the mobile client to be very tedious.

评论 #14750751 未加载

sriram_malharalmost 8 years ago

I really like elxir the language, but find myself strangely hamstrung by the _mix_ tool. There is only an introduction to the tool, but not a reference to all the bells and whistles of the tool. I'm not looking for extra bells and whistles, but simple stuff like pulling in a module from GitHub and incorporate it. Is there such documentation? How do you crack Mix?

评论 #14750451 未加载

renaudgalmost 8 years ago

It looks like they have built an interesting, robust and scalable system which is perfectly tailored to their needs.If one didn't want to build all of that in house though, is there anything they've described here that an off the shelf system like <a href="https://socketcluster.io" rel="nofollow">https://socketcluster.io</a> doesn't provide ?

评论 #14752524 未加载

etblgalmost 8 years ago

Reading posts like this about widely distributed applications always gets me interested in it as a career path. Currently I'm working as a front-end dev with moderate non-distributed back-end experience. How would someone in my situation, with no distributed back-end experience, break in to a position working on something like Discord?

omeid2almost 8 years ago

I think while this is great, it is good to remember that your current tech stack maybe just fine! after all, Discord start with mongodb[0].[1]. <a href="https://blog.discordapp.com/how-discord-stores-billions-of-messages-7fa6ec7ee4c7" rel="nofollow">https://blog.discordapp.com/how-discord-stores-billions-of-m...</a>

alberthalmost 8 years ago

Is there any update on BEAMJIT?It was super promising 3 or so years ago. But I haven't seen an update.Erlang is amazing in numerous ways but raw performance is not one of them. BEAMJIT is a project to address exactly that.<a href="https://www.sics.se/projects/beamjit" rel="nofollow">https://www.sics.se/projects/beamjit</a>

评论 #14749296 未加载

评论 #14749233 未加载

ramchipalmost 8 years ago

Very interesting article! One thing I'm curious about is how to ensure a given guild's process only runs on one node at a time, and the ring is consistent between nodes.Do you use an external system like zookeeper? Or do you have very reliable networking and consider netsplits a tolerable risk?

评论 #14750715 未加载

myth_drannonalmost 8 years ago

It's interesting how on StackOverflow Jobs Elixir knowledge is required more often than Erlang.<a href="http://www.reallyhyped.com/?keywords=erlang%2Celixir" rel="nofollow">http://www.reallyhyped.com/?keywords=erlang%2Celixir</a>

评论 #14748431 未加载

andy_pppalmost 8 years ago

Just as an aside how would people build something like this if they were to use say Python and try to scale to these sort of user levels? Has anyone succeeded? I'd say it would be quite a struggle without some seriously clever work!

评论 #14751601 未加载

neyaalmost 8 years ago

Hi community, Let me share my experience with you. I'm a hardcore Rails guy and I've been advocating and teaching Rails to the community for years.My workflow for trying out a new language involves using the language for a small side project and gradually would try to scale it up. So, here's my summary, my experience of all the languages so far:Scala - It's a vast academic language (official book is with ~700 pages) with multiple ways of doing things and it's attractiveness for me was the JVM. It's proven, robust and highly scalable. However, the language was not quite easy to understand and the frameworks that I've tried (Play 2, Lift) weren't as easy to transition to, for a Rails developer like me.Nevertheless, I did build a simple calendar application, but it took me 2 months to learn the language and build it.GoLang - This was my next bet, although I didn't give up on Scala completely (I know it has its uses), I wanted something simple. I used Go and had the same experience as I had when I used C++. It's a fine language, but, for a simple language, I had to fight a lot with configuration to get it working for me - (For example, it has this crazy concept of GOPATH where your project should reside and if your project isn't there it'll keep complaining). Nevertheless, I build my own (simple) Rails clone in GO and realized this isn't what I was looking for. It took my about a month to conquer the language and build my (simple) side project.Elixir - Finally, I heard of Elixir on multiple HN Rails release threads and decided to give it a go. I started off with Phoenix. The transition was definitely wayy smoother from Rails, especially considering the founding member of this language was a Rails dev. himself (the author of "devise" gem). At first some concepts seemed different (like piping), but once I got used to it, for me there was no looking back.All was fine until they released Phoenix 1.3, where they introduced the concept of contexts and (re) introduced Umbrella applications. Basically they encourage you to break your application into smaller applications by business function (similar to microservices) except that you can do this however you like (unopinionated). For example, I broke down my application by business units (Finance, Marketing, etc.). This forced me to re-think my application in a way I never would have thought and by this time I had finished reading all 3 popular books on this topic (Domain Driven Design). I loved how the fact that Elixir's design choices are really well suited for DDD. If you're new to DDD I suggest you try giving it a shot, it really can force you to re-think the way you develop software.By the end of two weeks after being introduced to Elixir, I picked up the language. In a month and a half, I built a complete Salesforce clone just working on the weekends. And this includes even the UI. And I love how my application is always blazing fast, picks up errors even before it compiles and warns me if I'm no using a variable I defined somewhere.P.S there IS a small learning curve involved if you're starting out fresh:1) IF you're used to the Rails asset pipeline, you'll need to learn some new tools like Brunch / Webpack / etc. 2) Understand about contexts & DDD (optional) if you want to better architect your application. 3) There is no return statement in Elixir!As a Ruby developer, here are my thoughts:1. So, will I be developing with Rails again? Probably yes, for simpler applications / API servers. 2. Is Ruby dying? No. In fact, I can't wait for Ruby 3.Some drawbacks of Elixir: 1. Relatively new, so sometimes you'll be on your own and that's okay. 2. Fewer libraries as compared to the Ruby eco-system. But you can easily write your own. 3. Fewer developers, but should be fairly to onboard Ruby developers.Cheers.

oldpondalmost 8 years ago

When have you ever read, "How Acme scaled J2EE to 5M Concurrent Users"? I became an IT architect in 1998 at IBM, the year Sun released j2ee and IBM released Websphere. I have experienced 20 years of enterprise Java and object oriented computing, and I was thrilled when Elixir came out. I was a mainframe programmer before OO became all the rage, so I never really felt at home doing objects. Functional programming feels completely natural to me though.What I like about this article is that they shared everything they learned with the community. Thank you for that excellent experience report.

grantwualmost 8 years ago

"Discord clients depend on linearizability of events"Could this be possibly be the cause of the message reordering and dropping that I experience when I'm on a spotty connection?

agentgtalmost 8 years ago

I realize this is off topic but how does Discord make money? I can't figure out their biz model (I'm not a gamer so I didn't even know about them).

jaequeryalmost 8 years ago

Anyone know if Phoenix/Elixir have something similar to Ruby's bettererror gem? I see Phoenix has a built-in error stack trace page which looks like a clone of bettererror but it doesn't have the real-time console inside of it.Also, I wish they had a ORM like Sequel. These two are really what is holding me back from going full in on Elixir. Anyone can care to comment on this?

评论 #14748980 未加载

评论 #14748664 未加载

评论 #14748566 未加载

评论 #14749144 未加载

评论 #14748674 未加载

评论 #14748563 未加载

zitterbewegungalmost 8 years ago

Compared to slack discord is a much better service for large groups . Facebook uses them for react.

评论 #14748751 未加载

concatimealmost 8 years ago

Sad to see some people taking raw and insignificant benchmarks to evaluate a language[0].[0] <a href="https://news.ycombinator.com/item?id=14479757" rel="nofollow">https://news.ycombinator.com/item?id=14479757</a>

frampalmost 8 years ago

Really lovely post!I wonder how Cloud Haskell would fare in such a scenario

brightballalmost 8 years ago

I so appreciate write ups that get into details of microsecond size performance gains at that scale. It's a huge help for the community.

评论 #14748366 未加载

dandarealmost 8 years ago

What is the business model behind Discord? They boast about being free multiple times, how do they make money? Or plan to make money?

评论 #14752445 未加载

KrishnaHarishalmost 8 years ago

Scale!

KrishnaHarishalmost 8 years ago

What is Discord and Elixir?

marlokkalmost 8 years ago

"How Discord Scaled Elixir to 5M Concurrent Users"click link[Error 504 Gateway time-out]only on Hacker News

orliesaurusalmost 8 years ago

Unlike Discord's design team who seem to just copy all of Slack's designs and assets, the Engineering team seems to have their shit together, it is delightful to read your Elixir blogposts. Good job!

评论 #14748812 未加载

khananalmost 8 years ago

Problem is that Discord sucks since it does not have a dedicated server. Sorry, move along.

评论 #14752230 未加载

40 comments

iagooaralmost 8 years ago

评论 #14748774 未加载

评论 #14748723 未加载

评论 #14750126 未加载

评论 #14749665 未加载

评论 #14748705 未加载

评论 #14748903 未加载

评论 #14751288 未加载

评论 #14750527 未加载

jakebasilealmost 8 years ago

评论 #14748608 未加载

评论 #14748695 未加载

评论 #14748761 未加载

评论 #14748765 未加载

Cieplakalmost 8 years ago

评论 #14750221 未加载

评论 #14748895 未加载

评论 #14749557 未加载

评论 #14749541 未加载

评论 #14752020 未加载

评论 #14749126 未加载

评论 #14748575 未加载

评论 #14751306 未加载

评论 #14749572 未加载

rdtscalmost 8 years ago

评论 #14749273 未加载

评论 #14748657 未加载

评论 #14749068 未加载

mbestoalmost 8 years ago

评论 #14748689 未加载

jlouisalmost 8 years ago

评论 #14749285 未加载

dansoalmost 8 years ago

评论 #14748997 未加载

didibusalmost 8 years ago

评论 #14749510 未加载

评论 #14748950 未加载

评论 #14749699 未加载

评论 #14750311 未加载

评论 #14748963 未加载

jmcgoughalmost 8 years ago

ShaneWiltonalmost 8 years ago

评论 #14749083 未加载

joonoroalmost 8 years ago

评论 #14748673 未加载

ConanRusalmost 8 years ago

评论 #14748874 未加载

majidazimialmost 8 years ago

评论 #14752024 未加载

评论 #14750765 未加载

评论 #14750673 未加载

_ar7almost 8 years ago

Really liked the blog post. Elixir and the capabilities of the BEAM VM seems really awesome, but I can't really find an excuse to really use them in my day to day anywhere.

StreamBrightalmost 8 years ago

brian_hermanalmost 8 years ago

I love discord's posts they are very informative and easy to read.

OOPManalmost 8 years ago

评论 #14750751 未加载

sriram_malharalmost 8 years ago

评论 #14750451 未加载

renaudgalmost 8 years ago

评论 #14752524 未加载

etblgalmost 8 years ago

omeid2almost 8 years ago

alberthalmost 8 years ago

评论 #14749296 未加载

评论 #14749233 未加载

ramchipalmost 8 years ago

评论 #14750715 未加载

myth_drannonalmost 8 years ago

评论 #14748431 未加载

andy_pppalmost 8 years ago

评论 #14751601 未加载

neyaalmost 8 years ago

oldpondalmost 8 years ago

grantwualmost 8 years ago

"Discord clients depend on linearizability of events"Could this be possibly be the cause of the message reordering and dropping that I experience when I'm on a spotty connection?

agentgtalmost 8 years ago

I realize this is off topic but how does Discord make money? I can't figure out their biz model (I'm not a gamer so I didn't even know about them).

jaequeryalmost 8 years ago

评论 #14748980 未加载

评论 #14748664 未加载

评论 #14748566 未加载

评论 #14749144 未加载

评论 #14748674 未加载

评论 #14748563 未加载

zitterbewegungalmost 8 years ago

Compared to slack discord is a much better service for large groups . Facebook uses them for react.

评论 #14748751 未加载

concatimealmost 8 years ago

frampalmost 8 years ago

Really lovely post!I wonder how Cloud Haskell would fare in such a scenario

brightballalmost 8 years ago

I so appreciate write ups that get into details of microsecond size performance gains at that scale. It's a huge help for the community.

评论 #14748366 未加载

dandarealmost 8 years ago

What is the business model behind Discord? They boast about being free multiple times, how do they make money? Or plan to make money?

评论 #14752445 未加载

KrishnaHarishalmost 8 years ago

Scale!

KrishnaHarishalmost 8 years ago

What is Discord and Elixir?

marlokkalmost 8 years ago

"How Discord Scaled Elixir to 5M Concurrent Users"click link[Error 504 Gateway time-out]only on Hacker News

orliesaurusalmost 8 years ago

Unlike Discord's design team who seem to just copy all of Slack's designs and assets, the Engineering team seems to have their shit together, it is delightful to read your Elixir blogposts. Good job!

评论 #14748812 未加载

khananalmost 8 years ago

Problem is that Discord sucks since it does not have a dedicated server. Sorry, move along.

评论 #14752230 未加载