IMO garbage collection is the epitome of sunk cost fallacy. Thirty years of good research thrown at a bad idea. The reality is we as developers choose not to give languages enough context to accurately infer the lifetime of objects. Instead of doing so we develop borderline self-aware programs to guess when we're done with objects. It wastes time, it wastes space, it wastes energy. If we'd spent that time developing smarter languages and compilers (Rust is a start, but not an end) we'd be better off as developers and as people. Garbage collection is just plain bad. I for one am glad we're finally ready to consider moving on.<p>Think about it, instead of finding a way of expressing when we're done with instances, we have a giant for loop that iterates over <i>all of memory</i> over and over and over to guess when we're done with things. What a mess! If your co-worker proposed this as a solution you'd probably slap them. This article proposes hardware accelerating that for loop. It's like a horse-drawn carriage accelerated by rockets. It's the <i>fastest</i> horse.
Azul Systems has asked Intel to do this once... but instead created their own processors with interesting memory barrier properties for awhile that greatly sped up JVMs beyond what was capable (at the time) on x86-32/ppc/sparc. Eventually they gave up and became a purely software company, but their "Java Mainframe" product was many times faster than the Intels of the age executing the same code despite much slower CPUs. Died a quick life despite the cool factor.
Readable copy of the paper at Berkeley:<p><a href="https://people.eecs.berkeley.edu/~krste/papers/maas-isca18-hwgc.pdf" rel="nofollow">https://people.eecs.berkeley.edu/~krste/papers/maas-isca18-h...</a><p>ETA: this doesn't seem to be quite the paper that the story refers to, but undoubtedly describes the same work in enough detail for people to get the gist of it. Darn paywalls.
They're comparing to an in-order CPU. Given that most CPUs are out-of-order (at least of the non-embedded variety, and GC is less used in such applications anyway), it would be better and more intellectually honest to actually compare to a typical CPU that performs GC. They kind of address this in the paper but only in a short aside: "Note that previous research [1] showed that out-of-order CPUs, while moderately faster, are not the best trade-off point for GC (a result we confirmed in preliminary simulations)." So they don't quantify what any of this means.<p>I think it's an interesting idea, but it doesn't bode well when they seemingly choose the wrong target for comparison and hand-wave away the difference as insignificant.
> globally this represents a large amount of computing resources.<p>Much of which would just sit idle otherwise, on client machines. Of course, the energy savings still apply.<p>> He also points out that many garbage collection mechanisms can result in unpredictable pauses, where the computer system stops for a brief moment to clean up its memory.<p>This is more of a hard barrier that's being solved.<p>All in all pretty cool idea, but I think the impact would be different from what's discussed here. Truly high-performance computing is already written in non-GC languages. This hardware would give medium-intensity GC programs (read: web servers on JVM, .NET, Node, Ruby) a boost, and could also allow some higher-but-not-peak intensity software to be written with GC where it might not be today (games come to mind), although that could actually encourage <i>more</i> energy usage than what it would save.
It reminds me of the days I was reading Knuth's quote "97% of the time premature optimisation blabla" every time someone was trying to make something faster.<p>CPUs are not getting faster, yet it seems using tools that makes things run faster are somehow taboo.<p>Wirth's law:<p>Wirth's law is an adage on computer performance which states that software is getting slower more rapidly than hardware becomes faster.<p>Why is java taught in university, and why is this language considered like some kind of standard? Most OSes are written in C, yet most of silicon valley frowns upon writing C because of arrays. Even C++ is getting a bad reputation.
Objective-C ARC (automatic reference counting) solved the problem neatly for my iOS apps.<p>Is there some overhead? Maybe, but it's neatly spread out through the entire application life time, so there is rarely[1] a UI-freezing stutter associated with GC. To reduce the overhead I turned off thread-safety and simply never access the same objects from more than one thread (object has to be "handed off" first if it comes to that).<p>One wart on the body of ARC is KVO, which I avoid like a plague for many other reasons anyway.<p>The other wart is strong reference loops. This can be solved by the app developer by designing architecture around the "ownership" concept (owners use strong references to their ownees, all other links are weak references). This is a good idea in itself as it increases clarity of the program. I do make an occasional slip, which is where I need to rely on Instruments, and I do wish I had better tools than that, something more automatic that would catch me in the act. Maybe a crawler that looks for loops in strong references during the development process but is quiet in release builds. Or at least give me a pattern to follow that makes it easy to catch my errors. For example, we could assign a sequential number to each allocated object, and only higher-ranked object could strongly refer to lower-ranked object. This won't work for everyone but I wouldn't mind fitting my app to this mold if that gave me immediate error when I slip.<p>[1] if you release a few million objects all at once it may stutter for a second. Could be handed off to a parallel thread maybe.
I’m probably wrong, but didn’t the Symbolics LISP machines have some kind of hardware support for GC?<p>I think for platforms like Android this makes a lot of sense. Should help quite a bit with battery consumption and responsiveness. Also makes sense for server loads in Java or Go.
My day job is writing C for on an embedded real-time system. No manual memory management necessary... because we're forced to declare all struct and array sizes at compile time! Not a malloc or free in sight. Obviously, it's extremely limiting - pretty limiting as far as algorithms beyond "read data off bus, store in fixed array, perform numeric calculation, write back to bus." But I've gotta say, it's pretty freeing to write C in such a limited environment.
From an environmental perspective, I wonder how much energy is consumed (and emissions generated) for garbage collection and interpreters. These things exist to make programming easier but are then duplicated across thousands of servers.<p>If everyone used some compiled language that was just a little simpler, a little safer, had just a little better memory management/tooling, or like here, had better hardware support, how much would that reduce global emissions caused by data centers?
I'm skeptical; GC is closely tied to programming language run-times. How is some accelerator going to know which pointers in an object are references to other GC objects and which are non-GC-domain pointers (like handles to foreign objects and whatnot)? How does the accelerator handle weak references and finalization?<p>People aren't going to massively rewrite their language run-times to target a boutique GC accelerator.
One talking point I'd like to ask is:<p>For small short lived scripts and applications, do we even need to free any memory these days? For example you write a script which takes several seconds to execute, moves files, computes stuff with strings, etc. Should we really invest time and effort in the script interpreter to free the memory, where instead we can just exit normally and let the OS handle the clean up.<p>I would imagine this kind of paradigm could be much faster to run because of less runtime work being performed. The allocator used could also be a simple linear allocator which just returns the next free address and increments the pointer. If using multiple threads there could be one per thread.<p>What do people think of this?
It does seem like just doing it in hardware may be a linear gain but isn't a fundamentally better algorithm. There's a proof that you do need to pause your program eventually, if you want to be sure you get all the garbage.
So my idea for GC is to offload it to a separate machine through a communications channel. The main CPU sends messages to the co-processor whenever it allocates memory, or whenever it mutates (whenever it writes a pointer to allocated memory or to the root set- there could be special versions of the move instructions which send these messages as a side-effect). There is a hardware queue for these messages and the main processor stalls if it's full (if it's getting ahead of the co-processor).<p>The co-processor then maintains a reference graph any way it likes in its own memory. It determines when memory can be freed using any of the classic algorithms, and sends messages back to the main processor to indicate which memory regions can be freed.<p>This has some nice characteristics: the co-processor does not necessarily disturb the cache of the main processor (it can have its own memory). Garbage collection is transparent as long as the co-processor can process mutations at a faster rate than the main processor can produce them. The queue handles cases where the mutation rate is temporarily faster than this.
You can usually precisely control garbage collection by turning it off or forcing it to run. The cognitive load to handle memory manually is not insignificant. If you control memory manually, you eventually end up designing some kind of mechanism like ref counting or something else to handle memory cleanup automatically. And there's significant reasoning that ref counting might not be the most desired solution for all use cases. Best is a combination of the ability to handle memory manually, with some more automated garbage collection when there's a need to write stuff that doesn't necessarily have to be the absolute fastest. Kitchen sink languages like C++ tend to have both and don't force the developer in either direction. Best would be to #define out 'new' and make manual memory handling explicit.
The irony of the thing is that in manual memory management languages you end up doing your own garbage collectors and in garbage collector languages you end up doing your own manual management. Unfortunately if you look in a language to solve such complex problems you are heading straight to severe disappointment land. Same shit different package. I still prefer dynamic languages by a long margin because of their ability to do decent metaprogramming and reflection which is essential for managing any form of data. Pick your poison and enjoy the hype while it lasts.
>> consumes a lot of computational power—up to 10 percent or more of the total time a CPU spends on an application.<p>I stopped reading there. 10% is nothing. For such a useful feature as automatic garbage collection, for the vast majority of applications, I'd gladly give away 50% of the CPU.<p>In terms of ensuring code correctness and robustness, if I had to choose static typing or automatic garbage collection, I'd pick garbage collection every time. It adds a lot of value in terms of development efficiency and code simplicity.
Want to get away from garbage collection, retain safety, but think Rust is too invasive? Try compile time reference counting: <a href="http://aardappel.github.io/lobster/memory_management.html" rel="nofollow">http://aardappel.github.io/lobster/memory_management.html</a>
> but the automated process that CPUs are tasked with consumes a lot of computational power—up to 10 percent or more of the total time a CPU spends on an application.<p>Is that even a problem when most CPUs are idle 90% of the time even when doing typical daily tasks?
The Kiwi scientific accelerator uses a similar approach with FPGAs I believe: <a href="https://www.cl.cam.ac.uk/~djg11/kiwi/" rel="nofollow">https://www.cl.cam.ac.uk/~djg11/kiwi/</a>
Hmm so there's a coprocessor that does the GC... doesn't it need to lock the memory away from the main CPU while it does that? And doesn't this lead back to unpredictable pauses and slowdowns?
I never understood the need for Garbage Collectors.
In my opinion, the difficulties of memory management are extremely overrated. I write code in C/C++ for almost 20 years and I never encountered a difficult bug that would have been avoided with a Garbage Collector.<p>If a coder really has a hard time with manual memory management it means he can't really code, this is a beginner problem...
Maybe I'm naive, but with multi-core CPUs, and parallel GCs, isn't it somehow the same? One core is mostly only used for GC, while the others do other things?<p>Edit: I guess they mention their chip itself can do it at a high level of parallelism, so that's probably one more advantage. But CPUs with additional slower cores and a lot more cores are in the works as well.
I don't care about being absolutely fast when writing code. The convienience of not having to care about memory management is far more important to me. That's why I like GCs.
Seems like we could just as easily stop using garbage collection. ... or even go back to reference counting / smart pointers and just live with the “limitation” that we can’t have circular references.
In Java, I created thread local resource pools including strings which eliminate garbage collection in sensitive routines. Of course it’s much faster in java to perform pooled string comparison with ==. Likewise I always use the indexed version of a for loop to avoid the iterator otherwise allocated.<p>GC in java is great for non-priority code which is most of the application.