The post glosses over the most important part of Erlang's GC: it collects process heaps separately. This transforms a hard problem (collecting a global heap with low latency despite concurrent mutators) to a _much_ simpler problem, at the price of more copying. Compare Java's G1 with Erlang's GC; the former hurts my head.<p>For those problems that are amenable to Erlang's model, this is a fine solution. The only real improvement here would be making collection incremental.
Scaling up an MQTT<->webhook relay that I wrote in Elixir to 1000’s of long running connections, I found that I needed to manually trigger periodic GCs on my long lived processes.<p>As binary strings work their way through the pipelines via messages, it leaves binaries on the binary heap that don’t go away because the ref count stays above 1. There are a number of GC parameters one can tune on a per process level that might cause a long lived process to collect more aggressively. But my long lived processes have a natural “ratchet” point where it was just easy to throw a collect in. This solved all of my slow growth memory problems.<p>I’ve read elsewhere that Erlangs GC benefits often on the basis that must Erlanger processes are short lived.
ORCA (as part of the Pony compiled language) includes a more performant GC than C4 or BEAM/HiPE. It does so by reducing almost to zero the need to do global GC pauses by sharding the heap per actor, zero-copy message passing, fine-grained concurrent sharing semantics, and lock-free data structures.
This is another article with more details: <a href="https://hamidreza-s.github.io/erlang%20garbage%20collection%20memory%20layout%20soft%20realtime/2015/08/24/erlang-garbage-collection-details-and-why-it-matters.html" rel="nofollow">https://hamidreza-s.github.io/erlang%20garbage%20collection%...</a>