For Better Computing, Liberate CPUs from Garbage Collection

433 pointsby mbroncanoabout 6 years ago

31 comments

arcticbullabout 6 years ago

IMO garbage collection is the epitome of sunk cost fallacy. Thirty years of good research thrown at a bad idea. The reality is we as developers choose not to give languages enough context to accurately infer the lifetime of objects. Instead of doing so we develop borderline self-aware programs to guess when we're done with objects. It wastes time, it wastes space, it wastes energy. If we'd spent that time developing smarter languages and compilers (Rust is a start, but not an end) we'd be better off as developers and as people. Garbage collection is just plain bad. I for one am glad we're finally ready to consider moving on.Think about it, instead of finding a way of expressing when we're done with instances, we have a giant for loop that iterates over all of memory over and over and over to guess when we're done with things. What a mess! If your co-worker proposed this as a solution you'd probably slap them. This article proposes hardware accelerating that for loop. It's like a horse-drawn carriage accelerated by rockets. It's the fastest horse.

评论 #19866052 未加载

评论 #19868130 未加载

评论 #19866207 未加载

评论 #19867835 未加载

评论 #19866159 未加载

评论 #19866155 未加载

评论 #19867006 未加载

评论 #19866208 未加载

评论 #19866383 未加载

评论 #19866123 未加载

评论 #19868225 未加载

评论 #19867046 未加载

评论 #19867062 未加载

评论 #19868566 未加载

评论 #19866743 未加载

评论 #19871202 未加载

评论 #19867035 未加载

评论 #19868014 未加载

评论 #19866799 未加载

评论 #19866658 未加载

评论 #19866636 未加载

评论 #19869805 未加载

评论 #19869141 未加载

评论 #19866583 未加载

评论 #19869808 未加载

exabrialabout 6 years ago

Azul Systems has asked Intel to do this once... but instead created their own processors with interesting memory barrier properties for awhile that greatly sped up JVMs beyond what was capable (at the time) on x86-32/ppc/sparc. Eventually they gave up and became a purely software company, but their "Java Mainframe" product was many times faster than the Intels of the age executing the same code despite much slower CPUs. Died a quick life despite the cool factor.

评论 #19865583 未加载

评论 #19865510 未加载

评论 #19865384 未加载

评论 #19866464 未加载

notacowardabout 6 years ago

Readable copy of the paper at Berkeley:<a href="https://people.eecs.berkeley.edu/~krste/papers/maas-isca18-hwgc.pdf" rel="nofollow">https://people.eecs.berkeley.edu/~krste/papers/maas-isca18-h...</a>ETA: this doesn't seem to be quite the paper that the story refers to, but undoubtedly describes the same work in enough detail for people to get the gist of it. Darn paywalls.

评论 #19866336 未加载

simenabout 6 years ago

They're comparing to an in-order CPU. Given that most CPUs are out-of-order (at least of the non-embedded variety, and GC is less used in such applications anyway), it would be better and more intellectually honest to actually compare to a typical CPU that performs GC. They kind of address this in the paper but only in a short aside: "Note that previous research [1] showed that out-of-order CPUs, while moderately faster, are not the best trade-off point for GC (a result we confirmed in preliminary simulations)." So they don't quantify what any of this means.I think it's an interesting idea, but it doesn't bode well when they seemingly choose the wrong target for comparison and hand-wave away the difference as insignificant.

评论 #19867139 未加载

评论 #19871117 未加载

_bxg1about 6 years ago

> globally this represents a large amount of computing resources.Much of which would just sit idle otherwise, on client machines. Of course, the energy savings still apply.> He also points out that many garbage collection mechanisms can result in unpredictable pauses, where the computer system stops for a brief moment to clean up its memory.This is more of a hard barrier that's being solved.All in all pretty cool idea, but I think the impact would be different from what's discussed here. Truly high-performance computing is already written in non-GC languages. This hardware would give medium-intensity GC programs (read: web servers on JVM, .NET, Node, Ruby) a boost, and could also allow some higher-but-not-peak intensity software to be written with GC where it might not be today (games come to mind), although that could actually encourage more energy usage than what it would save.

评论 #19865614 未加载

评论 #19865385 未加载

评论 #19866129 未加载

jokoonabout 6 years ago

It reminds me of the days I was reading Knuth's quote "97% of the time premature optimisation blabla" every time someone was trying to make something faster.CPUs are not getting faster, yet it seems using tools that makes things run faster are somehow taboo.Wirth's law:Wirth's law is an adage on computer performance which states that software is getting slower more rapidly than hardware becomes faster.Why is java taught in university, and why is this language considered like some kind of standard? Most OSes are written in C, yet most of silicon valley frowns upon writing C because of arrays. Even C++ is getting a bad reputation.

评论 #19870225 未加载

评论 #19870639 未加载

DenisMabout 6 years ago

Objective-C ARC (automatic reference counting) solved the problem neatly for my iOS apps.Is there some overhead? Maybe, but it's neatly spread out through the entire application life time, so there is rarely[1] a UI-freezing stutter associated with GC. To reduce the overhead I turned off thread-safety and simply never access the same objects from more than one thread (object has to be "handed off" first if it comes to that).One wart on the body of ARC is KVO, which I avoid like a plague for many other reasons anyway.The other wart is strong reference loops. This can be solved by the app developer by designing architecture around the "ownership" concept (owners use strong references to their ownees, all other links are weak references). This is a good idea in itself as it increases clarity of the program. I do make an occasional slip, which is where I need to rely on Instruments, and I do wish I had better tools than that, something more automatic that would catch me in the act. Maybe a crawler that looks for loops in strong references during the development process but is quiet in release builds. Or at least give me a pattern to follow that makes it easy to catch my errors. For example, we could assign a sequential number to each allocated object, and only higher-ranked object could strongly refer to lower-ranked object. This won't work for everyone but I wouldn't mind fitting my app to this mold if that gave me immediate error when I slip.[1] if you release a few million objects all at once it may stutter for a second. Could be handed off to a parallel thread maybe.

评论 #19868805 未加载

评论 #19870756 未加载

azinman2about 6 years ago

I’m probably wrong, but didn’t the Symbolics LISP machines have some kind of hardware support for GC?I think for platforms like Android this makes a lot of sense. Should help quite a bit with battery consumption and responsiveness. Also makes sense for server loads in Java or Go.

评论 #19865070 未加载

评论 #19866404 未加载

评论 #19865056 未加载

jakeinspaceabout 6 years ago

My day job is writing C for on an embedded real-time system. No manual memory management necessary... because we're forced to declare all struct and array sizes at compile time! Not a malloc or free in sight. Obviously, it's extremely limiting - pretty limiting as far as algorithms beyond "read data off bus, store in fixed array, perform numeric calculation, write back to bus." But I've gotta say, it's pretty freeing to write C in such a limited environment.

CoolGuySteveabout 6 years ago

From an environmental perspective, I wonder how much energy is consumed (and emissions generated) for garbage collection and interpreters. These things exist to make programming easier but are then duplicated across thousands of servers.If everyone used some compiled language that was just a little simpler, a little safer, had just a little better memory management/tooling, or like here, had better hardware support, how much would that reduce global emissions caused by data centers?

评论 #19865149 未加载

评论 #19865414 未加载

评论 #19865342 未加载

评论 #19865336 未加载

评论 #19865152 未加载

评论 #19869172 未加载

评论 #19865477 未加载

kazinatorabout 6 years ago

I'm skeptical; GC is closely tied to programming language run-times. How is some accelerator going to know which pointers in an object are references to other GC objects and which are non-GC-domain pointers (like handles to foreign objects and whatnot)? How does the accelerator handle weak references and finalization?People aren't going to massively rewrite their language run-times to target a boutique GC accelerator.

评论 #19870888 未加载

daeminabout 6 years ago

One talking point I'd like to ask is:For small short lived scripts and applications, do we even need to free any memory these days? For example you write a script which takes several seconds to execute, moves files, computes stuff with strings, etc. Should we really invest time and effort in the script interpreter to free the memory, where instead we can just exit normally and let the OS handle the clean up.I would imagine this kind of paradigm could be much faster to run because of less runtime work being performed. The allocator used could also be a simple linear allocator which just returns the next free address and increments the pointer. If using multiple threads there could be one per thread.What do people think of this?

评论 #19867800 未加载

评论 #19868786 未加载

评论 #19867402 未加载

评论 #19867374 未加载

评论 #19869462 未加载

评论 #19867176 未加载

etaioinshrdluabout 6 years ago

It does seem like just doing it in hardware may be a linear gain but isn't a fundamentally better algorithm. There's a proof that you do need to pause your program eventually, if you want to be sure you get all the garbage.

评论 #19865139 未加载

评论 #19866128 未加载

jhallenworldabout 6 years ago

So my idea for GC is to offload it to a separate machine through a communications channel. The main CPU sends messages to the co-processor whenever it allocates memory, or whenever it mutates (whenever it writes a pointer to allocated memory or to the root set- there could be special versions of the move instructions which send these messages as a side-effect). There is a hardware queue for these messages and the main processor stalls if it's full (if it's getting ahead of the co-processor).The co-processor then maintains a reference graph any way it likes in its own memory. It determines when memory can be freed using any of the classic algorithms, and sends messages back to the main processor to indicate which memory regions can be freed.This has some nice characteristics: the co-processor does not necessarily disturb the cache of the main processor (it can have its own memory). Garbage collection is transparent as long as the co-processor can process mutations at a faster rate than the main processor can produce them. The queue handles cases where the mutation rate is temporarily faster than this.

评论 #19866014 未加载

评论 #19865302 未加载

评论 #19865607 未加载

thekingofhabout 6 years ago

You can usually precisely control garbage collection by turning it off or forcing it to run. The cognitive load to handle memory manually is not insignificant. If you control memory manually, you eventually end up designing some kind of mechanism like ref counting or something else to handle memory cleanup automatically. And there's significant reasoning that ref counting might not be the most desired solution for all use cases. Best is a combination of the ability to handle memory manually, with some more automated garbage collection when there's a need to write stuff that doesn't necessarily have to be the absolute fastest. Kitchen sink languages like C++ tend to have both and don't force the developer in either direction. Best would be to #define out 'new' and make manual memory handling explicit.

kilonabout 6 years ago

The irony of the thing is that in manual memory management languages you end up doing your own garbage collectors and in garbage collector languages you end up doing your own manual management. Unfortunately if you look in a language to solve such complex problems you are heading straight to severe disappointment land. Same shit different package. I still prefer dynamic languages by a long margin because of their ability to do decent metaprogramming and reflection which is essential for managing any form of data. Pick your poison and enjoy the hype while it lasts.

jonduboisabout 6 years ago

>> consumes a lot of computational power—up to 10 percent or more of the total time a CPU spends on an application.I stopped reading there. 10% is nothing. For such a useful feature as automatic garbage collection, for the vast majority of applications, I'd gladly give away 50% of the CPU.In terms of ensuring code correctness and robustness, if I had to choose static typing or automatic garbage collection, I'd pick garbage collection every time. It adds a lot of value in terms of development efficiency and code simplicity.

评论 #19867203 未加载

评论 #19866431 未加载

评论 #19866488 未加载

评论 #19866210 未加载

评论 #19866192 未加载

Aardappelabout 6 years ago

Want to get away from garbage collection, retain safety, but think Rust is too invasive? Try compile time reference counting: <a href="http://aardappel.github.io/lobster/memory_management.html" rel="nofollow">http://aardappel.github.io/lobster/memory_management.html</a>

stcredzeroabout 6 years ago

While we're at it, how about we liberate CPUs and caches from communication between threads and cores?

评论 #19865528 未加载

ekianjoabout 6 years ago

> but the automated process that CPUs are tasked with consumes a lot of computational power—up to 10 percent or more of the total time a CPU spends on an application.Is that even a problem when most CPUs are idle 90% of the time even when doing typical daily tasks?

评论 #19868894 未加载

wolfspiderabout 6 years ago

The Kiwi scientific accelerator uses a similar approach with FPGAs I believe: <a href="https://www.cl.cam.ac.uk/~djg11/kiwi/" rel="nofollow">https://www.cl.cam.ac.uk/~djg11/kiwi/</a>

nottorpabout 6 years ago

Hmm so there's a coprocessor that does the GC... doesn't it need to lock the memory away from the main CPU while it does that? And doesn't this lead back to unpredictable pauses and slowdowns?

stephc_int13about 6 years ago

I never understood the need for Garbage Collectors. In my opinion, the difficulties of memory management are extremely overrated. I write code in C/C++ for almost 20 years and I never encountered a difficult bug that would have been avoided with a Garbage Collector.If a coder really has a hard time with manual memory management it means he can't really code, this is a beginner problem...

评论 #19867226 未加载

didibusabout 6 years ago

Maybe I'm naive, but with multi-core CPUs, and parallel GCs, isn't it somehow the same? One core is mostly only used for GC, while the others do other things?Edit: I guess they mention their chip itself can do it at a high level of parallelism, so that's probably one more advantage. But CPUs with additional slower cores and a lot more cores are in the works as well.

评论 #19865580 未加载

k__about 6 years ago

Hasn't Rust basically solved that problem?But yeah, legacy stuff could profit from this.

评论 #19871131 未加载

unictekabout 6 years ago

Could this be applied to Chrome V8 for Javascript memory garbage collection?

评论 #19867511 未加载

qwsxyhabout 6 years ago

I don't care about being absolutely fast when writing code. The convienience of not having to care about memory management is far more important to me. That's why I like GCs.

jasonhanselabout 6 years ago

Didn't the old Lisp machines also do this?

mailslotabout 6 years ago

Seems like we could just as easily stop using garbage collection. ... or even go back to reference counting / smart pointers and just live with the “limitation” that we can’t have circular references.

评论 #19865296 未加载

评论 #19865184 未加载

评论 #19865306 未加载

评论 #19865617 未加载

评论 #19865178 未加载

评论 #19865697 未加载

评论 #19866233 未加载

评论 #19865681 未加载

评论 #19865179 未加载

SlipperySlopeabout 6 years ago

In Java, I created thread local resource pools including strings which eliminate garbage collection in sensitive routines. Of course it’s much faster in java to perform pooled string comparison with ==. Likewise I always use the indexed version of a for loop to avoid the iterator otherwise allocated.GC in java is great for non-priority code which is most of the application.

Causality1about 6 years ago

Ok, so saving 15 percent of 10 percent of power use by changing both how we build processors and how we write software. Doesn't seem worth it.

评论 #19865990 未加载