Looks like the big challenge is managing a large, LRU cache, which tends to be a difficult problem for GC runtimes. I bet the JVM, with its myriad tunable GC algorithms, would perform better, especially Shenandoah and, of course, the Azul C4.<p>The JVM world tends to solve this problem by using off-heap caches. See Apache Ignite [0] or Ehcache [1].<p>I can't speak for how their Rust cache manages memory, but the thing to be careful of in non-GC runtimes (especially non-copying GC) is memory fragmentation.<p>Its worth mentioning that the Dgraph folks wrote a better Go cache [2] once they hit the limits of the usual Go caches.<p>From a purely architectural perspective, I would try to put cacheable material in something like memcache or redis, or one of the many distributed caches out there. But it might not be an option.<p>It's worth mentioning that Apache Cassandra itself uses an off-heap cache.<p>[0]: <a href="https://ignite.apache.org/arch/durablememory.html" rel="nofollow">https://ignite.apache.org/arch/durablememory.html</a>
[1]: <a href="https://www.ehcache.org/documentation/2.8/get-started/storage-options.html#bigmemory-(off-heap-store)" rel="nofollow">https://www.ehcache.org/documentation/2.8/get-started/storag...</a>
[2]: <a href="https://blog.dgraph.io/post/introducing-ristretto-high-perf-go-cache/" rel="nofollow">https://blog.dgraph.io/post/introducing-ristretto-high-perf-...</a>
Seems like you were hitting: runtime: Large maps cause significant GC pauses #9477 [0]<p>Looks like this issue was resolved for maps that don't contain pointers by [1]. From the article, sounds like the map keys were strings (which do contain pointers, so the map would need to be scanned by the GC).<p>If pointers in the map keys and values could be avoided, it would have (if my understanding is correct) removed the need for the GC to scan the map. You could do this for example by replacing string keys with fixed size byte arrays. Curious if you experimented this approach?<p>[0] <a href="https://github.com/golang/go/issues/9477" rel="nofollow">https://github.com/golang/go/issues/9477</a><p>[1] <a href="https://go-review.googlesource.com/c/go/+/3288" rel="nofollow">https://go-review.googlesource.com/c/go/+/3288</a>
Tokio author here (mentioned in blog post). It is really great to see these success stories.<p>I also think it is great that Discord is using the right tool for the job. It isn't often that you <i>need</i> the performance gains that Rust & Tokio so pick what works best to get the job done and iterate.
> Changing to a BTreeMap instead of a HashMap in the LRU cache to optimize memory usage.<p>Collections are one of the big areas where Go's lack of generics really hurts it. In Go, if one of the built in collections does not meet your needs, you are going to take a safety and ergonomic hit going to a custom collection. In Rust, if one of the standard collections does not meet your needs, you (or someone else) can create a pretty much drop-in replacement that does that has similar ergonomic and safety profiles.
If you have a problem at hand which does not really benefit from the presence of a garbage collector, switching to an implementation without a garbage collector has quite a potential to be at least somewhat faster. I remember myself to run onto this time trigger for garbage collection long in the past - though I don't remember why and mostly forgot about ever since until I read this article. As also written in the article, even if there are no allocations going on, Go forces a gc every two minutes, it is set here: <a href="https://golang.org/src/runtime/proc.go#L4268" rel="nofollow">https://golang.org/src/runtime/proc.go#L4268</a><p>The idea for this is (if I remember correctly) to be able to return unused memory to the OS. As returning memory requires a gc to run, it is forced in time intervals. I am a bit surprised that they didn't contact the corresponding Go developers, as they seem to be interested in practical use cases where the gc doesn't perform well. Besides that newer Go releases improved the gc performance, I am a bit surprised that they didn't just increase this time interval to an arbitrary large number and checked, if their issues went away.
This seems like a nice microservices success story. It's so easy to replace a low-performing piece of infrastructure when it is just a component with a well-defined API. Spin up the new version, mirror some requests to see how it performs, and turn off the old one. No drama, no year-long rewrites. Just a simple fix for the component that needed it the most.
> After digging through the Go source code, we learned that Go will force a garbage collection run every 2 minutes at minimum. In other words, if garbage collection has not run for 2 minutes, regardless of heap growth, go will still force a garbage collection.<p>> We figured we could tune the garbage collector to happen more often in order to prevent large spikes, so we implemented an endpoint on the service to change the garbage collector GC Percent on the fly. Unfortunately, no matter how we configured the GC percent nothing changed. How could that be? It turns out, it was because we were not allocating memory quickly enough for it to force garbage collection to happen more often.<p>As someone not too familiar with GC design, this seems like an absurd hack. That this 2-minute hardcoded limitation is not even configurable comes across as amateurish even. I have no experience with Go -- do people simply live with this and not talk about it?
It should also be noted that Rust interoperates extremely well with Erlang, which is the basis of Discord (via Rustler).<p><a href="https://github.com/rusterlium/rustler" rel="nofollow">https://github.com/rusterlium/rustler</a><p><a href="https://blog.discordapp.com/scaling-elixir-f9b8e1e7c29b" rel="nofollow">https://blog.discordapp.com/scaling-elixir-f9b8e1e7c29b</a>
It's always good to see a case-study/anecdote, but nothing in here is surprising. It also doesn't really invalidate Go in any way.<p>Rust is faster than Go. People use Go, like any other technology, when the tradeoffs between developer iteration/throughput/latency/etc. make sense. When those cease to make sense, a hot path gets converted down to something more efficient. This is the natural way of things.
I'm glad they found a good solution (rust) to solve their problem!<p>Also note this was with Go1.9. I know GC work was ongoing during that time, I wonder if this time of situation would still happen?
I chose Rust over Go after weighing the pros and cons. It was an easy decision. I wouldn't consider using a high level language that lacks generics. The entire point of using a high level language is writing less code.
It'd be cool to look at more signal statistics from the CPU plot.<p>It appears that Go has a lower CPU floor, but it's killed by the GC spikes, presumably due to the large cache mentioned by the author.<p>This is interesting to me. It suggests that Rust is better at scale than Go, and I would have thought with Go's mature concurrency model and implementation would have been optimized for such cases while Rust would shine in smaller services with CPU bound problems.<p>Great post!
Why /does/ it run a GC every 2 minutes? I went looking and didnd't find a reason in the code...<p><a href="https://github.com/golang/go/search?q=forcegcperiod&unscoped_q=forcegcperiod" rel="nofollow">https://github.com/golang/go/search?q=forcegcperiod&unscoped...</a><p>Go's GC seems kind of primitive.
Really interesting post, however they're using a 2+years old runtime, Go 1.9.2 was released 2017/10/25 why did they not even try Go 1.13?<p>For me the interesting part is that their new implementation in Rust with a new data structure is less than 2x faster than an implementation in Go using a 2+years old runtime.<p>It shows how fast Go is vs an very optimized language + new data structure with no GC.<p>Overall I'm pretty sure there was a way to make the spikes go away.<p>Still great post.
I'm curious what the product-engineering landscape in the company looks like to allow for a language rewrite to happen. I feel like this would be a hard sell in all companies I've worked at. Was this framed as a big bug fix? Or was faster performance framed as a feature?
The Twitch folks were facing a related situation with the GC. They developed a workaround that they called Ballast, reducing the overall latency and making it more predictable. Quite impressive results [0].<p>The Go's GC is groundbreaking in several aspects, but probably needs to provide ways to fine-tune it. Posts like this make me believe that one-size-fits-all settings are yet to be seen.<p>[0]: <a href="https://blog.twitch.tv/en/2019/04/10/go-memory-ballast-how-i-learnt-to-stop-worrying-and-love-the-heap-26c2462549a2/" rel="nofollow">https://blog.twitch.tv/en/2019/04/10/go-memory-ballast-how-i...</a>
Non programmer here, but would it make sense to add a keyword (or flag) to Go to manually allocate a piece of memory (ie not use GC).
That way, for some use cases, you could use avoid GC for the critical path. Then when GC happened, it could be very fast as there would be far less to pause-and-scan (in this use case example).
Obviously this would have to be optional and discouraged...but there seems to be no way to write an intensive real-time app with a GC based language.
(again non-programmer that is writing this to learn more ;-)
Go is not a general-purpose language. It's a Google language designed to solve Google's problems. If you aren't Google, you probably have different problems, which Go isn't intended to solve.<p>EDIT: Currently at -4 downvotes. Would downvoters care to discuss their votes?
> Changing to a BTreeMap instead of a HashMap in the LRU cache to optimize memory usage.<p>Can someone explain to me how BTreeMap is more memory efficient than a HashMap?
When I see this kind of GC performance, I wonder why you wouldn't change the implementation to use some sort of pool allocator. I am guessing each Read State object is identical to one another (e.g. some kind of struct) so why not pre-allocate your memory budget of objects and just keep an unused list outside of your HasMap? In a way this is even closer to a ring where upon ejection you could write the object to disk (or Cassandra), re-initialise the memory and then reuse the object for the new entry.<p>I suppose that won't stop the GC from scanning the memory though ... so maybe they had something akin to that. I assume that a company associated with games and with some former games programmers would have thought to use pool allocators. Honestly, if that strategy didn't work then I would be a bit frustrated with Go.<p>I have to say, out of all of the non-stop spamming of Rust I see on this site - this is definitely the first time I've thought to myself that this is a very appropriate use of the language. This kind of simple yet high-throughput workhorse of a system is a great match for Rust.
Uhm, I'd suppose the service runs on one or more dedicated nodes - so there should be no competition for RAM (or if a node runs multiple services, the I'd expect a fixed memory amount to be available). In such an environment, each fixed size LRU cache could just allocate a huge chunk of RAM for data + indices (index size is bound by data size). That's nothing to do with the ownership model, it's just manually managed memory.<p>Yes, reality is more complex since they probably have multi socket servers/NUMA, which might add memory access latencies and atomic updates to the LRU might require a locking scheme, which also isn't trivial (and where async Rust might be useful).
And brings me back to my years old nag. "Ok, you got GC, fine. But DO give me option to hand free specific memory when I want to. I don't consider hand allocation and deallocation such a pain than GC going wild."<p>This doesn't only go for Go.
I wish the article would show a graph of the golang heap usage. I'm reminded of this cloudflare article [0] from a while back where they created an example that seemed to exhibit similar performance issues when they created many small objects to be garbaged collected. They solved it by using a pooled allocator instead of relying solely on the GC. Wonder if that would have been applicable here to the go version.<p>[0] <a href="https://blog.cloudflare.com/recycling-memory-buffers-in-go/" rel="nofollow">https://blog.cloudflare.com/recycling-memory-buffers-in-go/</a>
Pretty amazing write up from Jesse. I really like how they maxed out Go first before even thinking about a rewrite in Rust. It turns out no-GC has pretty significant advantages in some cases.
Not to participate in the flaming-- but I'd love to hear some stats about compile times for the two versions of the service. (Excellent write-up by the way! Thanks!)
The one problem I’m curious as to how channel-based chat applications solve, to which my google-fu has never lead me in the right direction: how do you handle subscriptions?<p>I imagine a bunch of front end servers managing open web sockets connections, and also proving filtering/routing of newly published messages. Alas, it’s probably best categorized as a multicast-to-server, multicast-to-user problem.<p>Anyways, if there’s an elegant solution to this problem, would love to learn more.
This is a bit late to add, but from the description of the problem in the article, the way to make the program faster, irregardless of language, is to use a big array rather than lists and trees. Carve the array up as necessary, so the array of users to offsets in the array where the data is. Basically, be your own memory allocator, with all the loss of safety but the order of magnitude improvement in efficiency that that brings.
These kinds of posts would be much more interesting if they discussed alternatives considered and rejected. For example why did they choose Rust over C++?
Usually this kind of article is about "migrating from massively popular language to more niche language that we like better".<p>This is more from niche to niche. Thought that was interesting, but yet the discussion here wasn't all that different to the usual. Guess it's flamewars always, regardless of popularity.
The next step I expected after LRU tunning was to do simple sharding per user, so that there are more services with smaller caches, (cancelling out the impact) with smaller GC spikes, offset in time from each other. I'm curious if that was considered and not done for some reason.
Switching to Rust is a good idea, but I was wondering- would it be possible to run two identical instances in parallel and return results from the fastest one? This would almost completely eliminate GC pauses from the final output.
Curious about their definition of “response time” in the graph at the end. They’re quoting ~20 microseconds so I assume this doesn’t involve network hops? Is this just the CPU time it takes a Read State server to do one update?
> Changing to a BTreeMap instead of a HashMap in the LRU cache to optimize memory usage.<p>Why would BTreeMap be faster than HashMap?
HashMap performance is O(1), while BTreeMap performance is O(log N).
Wait, isn't Go devs said they solved GC latency problems [1]?<p>(from 2015): "Go is building a garbage collector (GC) not only for 2015 but for 2025 and beyond: A GC that supports today’s software development and scales along with new software and hardware throughout the next decade. Such a future has no place for stop-the-world GC pauses, which have been an impediment to broader uses of safe and secure languages such as Go." [2]<p>[1] <a href="https://www.youtube.com/watch?v=aiv1JOfMjm0" rel="nofollow">https://www.youtube.com/watch?v=aiv1JOfMjm0</a><p>[2] <a href="https://blog.golang.org/go15gc" rel="nofollow">https://blog.golang.org/go15gc</a>
I run the same question here: can't a memory pool be used in this case?<p>In gaming industry there are similar problems with GC and they were solved with memory pools
> but because the garbage collector needed to scan the entire LRU cache in order to determine if the memory was truly free from references<p>Yeah please tell me again how GC is a superior solution to reference counting in cases when you know exactly when you don't need the object anymore.<p>(Hint: RC is not GC if the object is dealocating itself)
This is consistent with my observations of porting Java code to Rust. Much simpler and nicer to read safe Rust code (no unsafe tricks) compiles to programs that outperform carefully tuned Java code.
This is not a fair comparison. Go 1.9.2 was released over 2 years ago. In that time they have fixed a lot of the GC stutter issues. Comparing rust nightly to a 2 year old compiler is unfair.
Would it have been better if they went with Elixir?<p>Write their code in a functional style. Get the benefits of the Erlang BEAM platform.<p>Their system runs over the web, so time sensitivity isn’t as important, in comparison to video games, VR, or AR.<p>Anyone ever done a performance comparison breakdown between something like Elixir vs. Rust?
Borrowed from a comment:<p>Garbage collection has gotten a lot of updates in the last 3 years. Why would you not take the exceedingly trivial step of just upgrading to the latest Go stable in order to at least <i>try</i> for the free win?
From the go 1.12 release notes:
“Go 1.12 significantly improves the performance of sweeping when a large fraction of the heap remains live. This reduces allocation latency immediately following a garbage collection.” ¯\_(ツ)_/¯
This sounds like “we just wanted to try Rust, ok?” Which is fine. But like, just say that.
just use an off heap hash table. simple.
<a href="https://github.com/glycerine/offheap" rel="nofollow">https://github.com/glycerine/offheap</a><p>Also, as others have said, lots of big GC improvements were ignored by insisting on go1.9.2 and not the latest.
Can someone wake me up when they switch from javascript to something native in the <i>client</i>?<p>I just checked and as usually, I have an entry labeled "Discord Helper (Not Responding)" in my process list. I don't think i've ever seen it in a normal state.
Confused, aren't they losing memory safety?<p>I get for certain core code situations, you want to manage all memory safety yourself (or use built in static GC), but beyond that it seems to me at a higher level you'd rather have the automatic GC. Why burden all of your developers rather than just a core few?<p>I don't think GC issues is a compelling argument to move everything to Rust. I'm not saying there aren't compelling arguments, but that just seems a bit odd that that's their main argument.
Wow, Rust is amazing, so fast! It is like these people never learnt c? Why did they spend all this time trying to optimise such a high level language? Surely they can afford a more experienced engineer who will tell them that is a path that isn't worth it? I jump straight to c when there is anything like this, although I guess Rust is an option these days.
Sounds like badly reinventing the wheel. If you need a large in-memory LRU cache, use memcached. Problem solved, because then Go doesn't need to allocate much memory anymore. And I'd wager that JSON serialization for sending a reply back to the client will dominate CPU load anyway, so that the overhead for Go to talk to Memcached will be barely noticeable.
When a company switches languages like this, it's usually because the engineers want to learn something new on the VC's dime. They'll make any excuse to do it. As many comments here show, there are other ways to solve this problem.
> Discord has never been afraid of embracing new technologies that look promising.<p>> Embracing the new async features in Rust nightly is another example of our willingness to embrace new, promising technology. As an engineering team, we decided it was worth using nightly Rust and we committed to running on nightly until async was fully supported on stable.<p>> Changing to a BTreeMap instead of a HashMap in the LRU cache to optimize memory usage.<p>It is always an algorithm change
I wonder if they actually did their homework. Doesn't matter if they like it, but they could have avoided rewriting, if they wanted.<p>The thing is, you can allocate memory outside of Go, and GC will simply ignore such regions, since GC only scan regions known to it. (Mmap should work like a charm here.) A drawback is that pointers in such regions will not be counted, but it's easy to workaround by copying whole data, which is encouraged by the language itself.<p>TBH, Go sucks for storing a large amount of data. As you can see here, even the simplest cache can be problematic. The language is biased towards large datacenters, where the amount of available resources are less of a concern. Say, this problem can be solved by having external cache servers and extra nodes around them. Latency will not be idealistic, but the service will survive with minimal changes.
Excellent write up, and effective argument for Rust in this application and others. My cynical side sums it up as:<p>"Go sucked for us because we refused to own our tooling and make a special allocator for this service. Switching to Rust forced us to do that, and life got better"
This was an extremely interesting read.<p>I'm quiet disappointed though they did not update their Go Version to 1.13[0][1] which would normally have remove the spike issue and thus he latency before they move to Rust...<p>Rust seems more performant with proper usage ( tokio + async ) but I'm more worried about the ecosystem that doesn't seem has mature has Go.<p>We could quote the recent[2] Drama with Actix...<p>[0]<a href="https://golang.org/doc/go1.13#runtime" rel="nofollow">https://golang.org/doc/go1.13#runtime</a>
[1]<a href="https://golang.org/doc/go1.12#runtime" rel="nofollow">https://golang.org/doc/go1.12#runtime</a>
[2]<a href="https://github.com/fafhrd91/actix-web-postmortem" rel="nofollow">https://github.com/fafhrd91/actix-web-postmortem</a>
Replatforming to solve this problem was a bit silly in my opinion. The solution to the problem was "do fewer allocations" which can be done in any language.
You're switching to Rust because Go is too slow? Colour me sceptical, but this seems more like an excuse to adopt a trendy language than a considered technical decision. Rust is designed first and foremost for memory safety, and it sacrifices a lot of developer time to achieve this, so if memory safety isn't high in your list of concerns Rust is probably not going to bring many benefits.
"We want to make sure Discord feels super snappy all the time" is hilarious coming from a program that is infamous for making you read 'quirky' loading lines while a basic chat application takes several seconds to start up.<p>Don't really know about Go versus Rust for this purpose, but don't really care because read states (like nearly everything that makes Discord less like IRC) is an anti-feature in any remotely busy server. Anything important enough that it shouldn't be missed can be pinned, and it encourages people to derail conversations by replying out of context to things posted hours or days ago.