The problematic code in the article is, AFAICT, using an object finalizer to free manually allocated memory; such approaches seldom work well, even with precise GCs.<p>Thread stacks are effectively manually allocated blocks of memory. You create a thread, which allocates the stack, and as long as the thread lives, the stack is kept alive - it's self-sustaining. The thread must die by explicit programmatic action, which in turn will free its allocated block of stack memory.<p>Using finalizers at all is usually an anti-pattern in a GC world. The presence of finalizers is a very strong hint that the GC is being used to manage resources other than memory, something that GC is a poor fit for, because other resources almost certainly have no necessary correlation with memory pressure; and GCs usually only monitor GC heap memory pressure.<p>That's not to say that there aren't plenty of edge cases where you can end up with lots of false roots that artificially lengthen object lifetimes with a conservative GC. Putting a thread stack in your cycle of object references and relying on GC pressure to break the cycle isn't a strongly motivating one to my mind, though.
Interestingly enough, SBCL on x86/x64 has a conservative, but moving, GC. It can know some, but not all roots precisely, so it pins any objects that are reachable through conservative roots.<p>It's earlier implementations were on RISC chips that had 24 or more GPRs so the implementation was simple: 2 stacks and divide the local registers in half for boxed and unboxed values. This obviously didn't work when porting to x86 which had far fewer registers.<p>The ARM port I believe uses the non-conservative approach, despite having 1 less register than x64 (the x64 port was derived from the x86 port so uses the same register assignments).
I am not an expert in garbage collection techniques, but this article does not even mention locality of reference (copying GCs improve locality on each compaction) and how many cache misses are introduced by increased fragmentation. Are there any benchmarks on this?
A safepoint in x86 is nothing more than the instruction mov [rip+0x1234], eax. That shouldn't cause a major slowdown? Also, safepoints are useful for features other than gc. For example, you can inspect a running thread's callstack. That is useful when debugging and when objectifying a thread's state.<p>Stack maps can be made a bit smaller by pushing and popping all registers from the stack during gc. That way, you only need to store the values of stack locations in them and not of individual registers.<p>Btw, the article is really good. This is the kind of stuff I keep coming back to HN for!
Conservative GC would probably work well enough for the JVM because there are no value types or inline arrays which more easily masquerade as roots, ie. a random sequence of bytes as used in crypto or hashing would yield a lot of false positives.<p>By comparison, the CLR is a much worse fit, because value types and stack/inline/fixed arrays means false positives would be much higher for some applications.
The Chakra Javascript engine uses a conservative generational mark and sweep collector with many phases running in parallel to code execution. It looks like Chakra is now on github (and with an MIT license). In chakra the GC is called it a 'Recycler', which can throw one for a loop when searching for the GC implementation.
This is just off the top of my head, but it made me wonder: are there any VMs that put a stack map header of some sort as a literal in the stack?<p>E.g. for each frame the compiler orders roots first and then other primitives. Then, as you enter the frame, write the number of roots to the stack. When the GC walks the stack it can see precisely which are roots.
The pain from conservative GC depends on how much your address space you are using.<p>In the 32-bit age, you ran into problems more and more as your heap approaches the GB range. At some point the probability that you end up with a false root that keeps a lot of garbage alive goes to 1.<p>In the 64-bit age we get a respite, although many systems don't really use 64-bit pointers.