Studying garbage collection is a wonderful education in algorithm engineering. Despite decades of work, there is no "best" GC algorithm. Instead, there are different points on the space of optimizing for space/throughput/latency/completeness/etc. Moreover, the various algorithms are linked by deep correspondences (e.g. Bacon's result that all collectors lie on a spectrum between pure tracing and pure reference counting, and that things like generational collection are hybrids.)
There have been a lot of posts related to garbage collection lately but none of them touched upon what I see as a crucial issue: why is garbage collection needed to begin with?<p>Could you do without it? What is the key point that made it necessary?<p>I'm aware of it being introduced by McCarthy in the original Lisp paper in 1960 of course. But I suspect what McCarthy originally meant is not what garbage collection turned out to be. What I suspect he meant was that there needs to be a way for memory to be managed. malloc/free offer a way for memory to be managed, and presumably they weren't invented until C was nine years later. What McCarthy might have meant is what became malloc/free in C, which doesn't need garbage collection.<p>C isn't the only flag here. Was there any OS on the IBM 704 used to implement the original Lisp? Did the OS support multiprocessing? Because if it didn't (UNIX wasn't invented until 1969 either) it would make sense for memory to be available for a single process. And it would mean when people said garbage collection they were envisioning malloc/free.<p>(Also, databases and operating systems can live without whatever makes garbage collection necessary, since they don't use it, and those are pretty complex and fast pieces of software.)<p>So, what makes garbage collection different than malloc/free, and why is it necessary? I'd love to learn more about that.
While the "Metronome" has very predictable behaviour that makes it probably the best GC collector for real-time purposes, it still has a maximum GC load before it gets backed up. If it gets backed up too far... forget about timing guarantees because the system will fail. The "Metronome" collector can guarantee a known and tunable GC capacity over time (in terms of objects collected/sec), which is good. But the flip side is that you need to be able to guarantee that your application will never exceed that capacity, at least not for any sustained period of time.<p>In order to provide hard real-time guarantees in a garbage collected system, you need to know that there is no situation in which the system produce more garbage faster than the collector can collect. With manual deallocation, you can prove that with static analysis. With garbage collection you have to demonstrate it empirically using dynamic analysis. That requires exhaustive testing to make sure you've covered the worst-case scenario.
There are actually quite a lot of hard real-time, mission critical systems (mostly defense) using RTSJ (real-time specification for Java) implementations in production, but I don't know how many make use of realtime GC (RTSJ allows for a semi-manual memory management, much simpler than malloc/free but not as easy as a full GC). Some RTSJ implementations have a realtime GC, like IBM's WebSphere Real Time (<a href="http://www-03.ibm.com/software/products/us/en/real-time/" rel="nofollow">http://www-03.ibm.com/software/products/us/en/real-time/</a>) -- that's the one using Metronome -- and Aicas Jamaica VM (<a href="http://www.aicas.com/" rel="nofollow">http://www.aicas.com/</a>). Sun/Oracle also had an RTSJ JVM with a realtime GC (said to be better than Metronome), but it seems to have been discontinued recently.
The author starts the piece enthusiastically marvelling at the fact that Real Time Garbage Collectors exist, but the article doesn't go very deep into how this particular one does it.<p>I myself was a bit disappointed when I read the limitations, which reveal that the simple laws still hold, you can't make these guarantees without exactly knowing the upper limit of the amount of memory you are going to allocate.<p>In the event that you design a real time system that dynamically allocates and deallocates objects, wouldn't it be almost or just as easy to implement manual memory management (through pools or whatnot) as it would be correctly identify the maximum amount of allocated memory?
Or you could go for pauseless garbage collection, then you only have to concern yourself with the collector throughput keeping up with the allocations.<p><a href="http://paperhub.s3.amazonaws.com/d14661878f7811e5ee9c43de88414e86.pdf" rel="nofollow">http://paperhub.s3.amazonaws.com/d14661878f7811e5ee9c43de884...</a>
I wonder if this could be improved also by "time stealing" - if the mutator is idling, it waives its slice, if the GC doesn't expect to collect much, it waives its slice. The result would be more irregular but still able to give guarantees.
I'm not sure I agree work based allocators are out.
Specifically, if you follow/mark N references for each M words allocated, you can guarantee to not fall behind and still have a strict upper bound on collection cost/run-time jitter. This adds a linear-in-size cost to memory allocation, which already typically has an amortized linear cost (because you touch all memory allocated) so it's analytically very well behaved.