The death of optimizing compilers [pdf]

217 点作者 fcambus大约 10 年前

34 条评论

I wish this was given in a better format instead of just slides, hopefully the talk is actually better. Here's what I think the talk is about:The pervasive view of software performance is that compilers are better than humans at optimizing code, but the few humans who optimize important bits of code to the maximum extent disagree Similarly, computer programs today are increasingly diverging into a state where there is a tiny amount of extremely performance critical code, and a large amount of code where performance is so good on our hardware today that even horribly unoptimized code has no noticeable effect on performance.Thus, optimizing compilers are useless on the first type of code (humans are better), and useless on the second (performance doesn't matter). So what good are they at all?If optimizing compilers aren't useful, what system should we use instead for making performant code? The author and collaborators' experience suggests that the reasons a compiler can't optimize code as well as a human when it matters is that our current programming languages don't give the compiler enough information about the intent of the code to optimize it. Therefore we should design programming languages that on the surface look very unoptimized, but specify enough information that compilers can do a really good job. It sounds like no one knows what such a programming language would look like.

评论 #9397653 未加载

评论 #9397786 未加载

评论 #9397924 未加载

评论 #9397329 未加载

评论 #9397391 未加载

评论 #9397492 未加载

评论 #9397745 未加载

评论 #9398116 未加载

评论 #9398808 未加载

评论 #9398802 未加载

评论 #9399645 未加载

评论 #9397280 未加载

评论 #9397441 未加载

评论 #9398306 未加载

评论 #9397505 未加载

评论 #9398541 未加载

评论 #9400179 未加载

ezyang大约 10 年前

The talk was given to a packed room at ETAPS, a European computer science conference leaning more on the theoretical side (I suppose everyone was curious how optimizing compilers were dying). All in all, the audience did not bust out the pitchforks, although one might say that "domain-specific compilers" is basically the direction academia has already been heading. I doubt any of the people in the audience who were working on compilers/JITs are planning to stop working on them, though it did make for some fun dinner discussion.There was one mid-talk exchange was one professor asking djb upfront whether or not he thought, in ten years, Mike Pall (author of LuaJIT) would be out of a job--after all, JITs are basically optimizing compilers. Well, the original question was more diplomatic than that, but eventually he pushed it enough that he got djb to not deny that this would be the case.The talk was somewhat marred by a very large digression into an undergrad level primer of computer architecture (it probably would have been better served by an extended Q&A session), although the sorting example he finally built up to was pretty cute.

robmccoll大约 10 年前

I really like the conversation with the compiler approach. I had the good fortune to write some code for one of these: <a href="http://en.m.wikipedia.org/wiki/Cray_XMT" rel="nofollow">http://en.m.wikipedia.org/wiki/Cray_XMT</a> which is a multi socket TB-scale shared memory machine with 128 hardware thread contexts per socket. It has an autoparallelizing C compiler that attempts to parallelize mostly for loops where it thinks it can (really quite a clever thing), but you can also tell it where to do things and give it hints through compiler pragmas. The compiler infrastructure will print out annotated copies of the code that tell you where it did and didn't parallelize and why. The effect is that you have this conversation with the compiler in which each of you tries to tell the other how to make the code more parallel (which in the case of the XMT means better and faster). It's very simple but the result can be orders of magnitude improvement.

评论 #9397569 未加载

评论 #9397586 未加载

p932大约 10 年前

Some bits of Fran Allen in Coders At Work book: <a href="http://www.codersatwork.com/fran-allen.html" rel="nofollow">http://www.codersatwork.com/fran-allen.html</a>"We were making so much good progress on optimizations and transformations. We were getting rid of just one nice problem after another. When C came out, at one of the SIGPLAN compiler conferences, there was a debate between Steve Johnson from Bell Labs, who was supporting C, and one of our people, Bill Harrison, who was working on a project that I had at that time supporting automatic optimization. The nubbin of the debate was Steve’s defense of not having to build optimizers anymore because the programmer would take care of it. That it was really a programmer’s issue""We have seriously regressed, since C developed. C has destroyed our ability to advance the state of the art in automatic optimization, automatic parallelization, automatic mapping of a high-level language to the machine. This is one of the reasons compilers are . . . basically not taught much anymore in the colleges and universities. "

mafribe大约 10 年前

I attended djb's ETAPS talk and am still not true if he was deliberately provocative or genuine. Assuming the latter, I disagree with several of his points. Here I want to bring one to the readers' attention and it's to do with the economics of correctness proofs.One of his key arguments was that compiler optimisations are difficult to prove correct, and that's one of the reasons why optimising compilers will be replaced by a combination of simple compilers + assembly hand-written by domain experts. It is true that such proofs are (currently) expensive, but misses the point of the economics of correctness proofs: Correctness proofs are difficult, but proving compilers correct amortises that cost over all subsequent uses. In contrast, program specific correctness proofs are typically of comparable difficulty, but don't amortise in this way. Therefore it seems to be cheaper in the long run, to focus on the correctness of optimising compilers. Moreover, as compilers and optimisations are quite a restricted class of algorithms, hence it is more likely that we can reuse (parts of) correctness proofs and prover technology for compilers.

评论 #9398110 未加载

peapicker大约 10 年前

To finish the Don Knuth quote: "There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. ""Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified."

评论 #9397392 未加载

nullc大约 10 年前

I often boggle at people who claim that compilers are magic and outperform humans-- perhaps thats true for unimportant code that you'd pay no attention to, or with developers who aren't familiar with the underlying micro-architecture at all.It's pretty usual for me to see a factor of 2 performance for the same algorithm implemented in the same manner when moving from SIMD intrensics (which almost directly map the underlying platform) to hand coded ASM.Even non-SIMD code can result in some pretty stark changes.A non-SIMD example from a crypto library I work on which isn't (yet) very well optimized for ARM, benchmarked on my novena (with GCC 4.9.1 -O3):ecdsa_verify: min 1927us / avg 1928us / max 1929usAnd a hand conversion of two hot inner functions (which are straight line multiple and add lattices, no control flow) into arm assembly:ecdsa_verify: min 809us / avg 810us / max 811usAgain, same algorithm.(The parallel change for x86_64 is still significant, but somewhat less extreme; in that case converting the same functions is only a 15% speedup overall, partially because the 64bit algorithm is different).When thats a difference which results in 2x the amount of hardware (or 15% for that matter) for a big deployment; it can justify a LOT of optimization time.(Or in my case, the performance of this code will have a material effect on the achievable scale and decentralization of the Bitcoin network.)From a straight development time vs performance perspective I'd use even more hand written code; ... but there is a maintenance/auditing/eview/verification overhead too. And often the same code that you cannot tolerate being slow you also cannot tolerate being wrong.

tel大约 10 年前

Copied from lobste.rs since I think it'd be interesting to the audience here as well---This “dialogue with the compiler” bit that djb lands on is in some sense obviously the right way to go forward. I’ve found this to be the case not in optimization—though I’m not in the least bit surprised to see it there—but instead in the world of dependent types. The language that the program writer writes in is often just a skeleton of the genuine information of the program. For instance, in a dependently typed program it’s often very difficult for an author to immediately write all of the complex intermediate types required to drive proofs forward, but it’s much easier to achieve this in collaboration with the compiler (really the typechecker, and e.g. via a tactics language and interactive proof general-like interface). The ultimate result, the “elaborated” program, contains much, much more information than the skeletal program the programmer originally wrote. It has been annotated by the collaboration of the compiler and the program writer to extract more of the programmer’s intention.The same kind of thing could clearly arise from a “collaboration” over optimization. It’s even quite possible that these are the same dialogue as dependent types certainly provide much more information to the compiler about the exact properties the code ought to substantiate—in a nice, machine readable format even.

评论 #9409228 未加载

dfbrown大约 10 年前

An optimizing compiler may not be better than me at optimizing hot code paths, but my time is a very limited resource. The compiled version may only be 75% as fast as my hand optimized version, but writing that hand optimized version will likely take several times longer. Sometimes it is worth spending the extra time for that performance, but usually it is not.

评论 #9398651 未加载

corysama大约 10 年前

Very related, but surprisingly not covered, I'll point out what was covered in depth at Mike Acton's CppCon14 keynote "Data-Oriented Design and C++"<a href="https://www.youtube.com/watch?v=rX0ItVEVjHc" rel="nofollow">https://www.youtube.com/watch?v=rX0ItVEVjHc</a><a href="http://www.slideshare.net/cellperformance/data-oriented-design-and-c" rel="nofollow">http://www.slideshare.net/cellperformance/data-oriented-desi...</a>And that is: Because of the ever-growing disparity between ALU vs IO speeds, the vast majority of time spent waiting on computers is because of issues that the compiler can not optimize. In general, compilers have very few opportunities to rearrange your data structures without your explicit, manual input. They can't help your CPU stall on memory/disc/network IO less by any significant amount. They can only help when your CPU actually has the data it needs to proceed --which often is less than 20% of total execution time.In that case, no matter how smart GCC gets, it probably can't ever speed up your existing code by over 20% than it does today. It's not allowed to by the spec. I'm not aware of any general-purpose language where this is an option to any significant degree (silent AOS-to-SOA, hot-vs-cold data segregation, tree clustering, etc...)If your program is too slow, it's almost certainly because you haven't done the hard, still-manual work of optimizing your data access patterns. Not just your Big-O's (N^2) vs (NlogN), but also your Big O's hidden, implicit K. The K that academia actively ignores and that most people rarely think about because it is mostly composed of cache misses that are implicit and invisible in your code. x = sqrt(y) is super cheap compared to x = *y. But, the same people who fret over explicit ALU costs usually think very little of x->y->z.

评论 #9397994 未加载

评论 #9397908 未加载

mightybyte大约 10 年前

If you agree with the author's conclusion that we need better languages that allow us to give the compiler more information about the optimization needs of our program, then I think you have to look in the direction of languages like Haskell, Idris, etc. Fortran can be faster than C because C has aliasing that limits the optimizations that the compiler can perform. Similarly, strongly typed and pure languages like Haskell give you even more. You can do a lot with types, but on top of that Haskell allows you to define rewrite rules that tell the compiler how things can be optimized. This allows the compiler to automatically convert things like this:<pre><code> map f (map g items) </code></pre> ...into this:<pre><code> map (f . g) items</code></pre>

评论 #9397851 未加载

评论 #9399129 未加载

Animats大约 10 年前

There are a few basic optimizations we should routinely have in compilers today, but don't always find there.-- Multidimensional array optimizations. Basically, get to the level of subscript calculation overhead seen in FORTRAN compilers of 50 years ago.-- Subscript checking optimization. Work on Pascal compilers in the 1980s showed that about 95% of subscript checks could be eliminated or hoisted out of inner loops without loss of safety. This was forgotten during the C era, because C is vague about array sizes. Go optimizes out checks for the simple cases; Rust should and probably will in time. Optimization goal: 2D matrix multiply and matrix inversion should have all subscript checks hoisted out of inner loops or eliminated. Compilers that don't have this feature lead to users demanding a way to turn off subscript checking. That leads to buffer overflows. (Compilers must know that it's OK to detect a subscript check early; it's OK to abort before entering a loop if there will inevitably be an subscript error at iteration 10000.)-- Automatic inlining. If the call is expensive relative to the code being called, inline, then optimize. Ideally, this should work across module boundaries.-- Atomic operations and locking. The compiler needs to know how to do those efficiently. Calling a subroutine just to set a lock is bad. Making a system call is worse. Atomic operations often require special instructions, so the compiler needs to know about them.-- Common subexpression elimination for pure functions. Recognize pure functions (where x=y => f(x) = f(y) and there are no side effects) and routinely optimize. This is essential in code with lots of calls to trig functions.

评论 #9398573 未加载

评论 #9398623 未加载

评论 #9398493 未加载

sp332大约 10 年前

More discussion from a month ago <a href="https://news.ycombinator.com/item?id=9202858" rel="nofollow">https://news.ycombinator.com/item?id=9202858</a>

AKrumbach大约 10 年前

"For some reason we all (especially me) had a mental block about optimization, namely that we always regarded it as a behind-the-scenes activity, to be done in the machine language, which the programmer isn't supposed to know. This veil was first lifted from my eyes when I ran across a remark by Hoare that, ideally, a language should be designed so that an optimizing compiler can describe its optimizations in the source language. Of course!"That sounds like he wants some sort of homoiconic assembly or machine language to target. Does such a thing even exist?

评论 #9397398 未加载

DannyBee大约 10 年前

He actually quotes my rebuttal comment - "“Except, uh, a lot of people have applications whose profiles are mostly flat, because they’ve spent a lot of time optimizing them.”and his response is "this view is obsolete, and to the degree it isn't, flat profiles are dying".Oh great, that's nice, i guess i can stop worrying about the thousands of C++ applications google has built that display this property, and ignore the fact that in fact, the profiles have gotten more flat over time, not less flat.Pack it up boys, time to go home!Basically, he's just asserting i'm wrong, with little to no data presented, when i'm basing mine on the results of not only thousands of google programs (which i know with incredible accuracy), but the thousands of others at other companies that have found the same. I'm not aware of him poring over performance bugs for many many thousands of programs for the past 17 years. I can understand if he's done it for his open source programs (which are wonderful, BTW :P)He then goes on to rebut other comments with simple bald assertions (like the luajit author's one) with again, no actual data.So here's some more real data: GCC spent quite a while optimizing interpreter loops, and in fact, did a better job than "the experts" or whatever on every single one it was been handed.So far, as far i can tell, the record is: If GCC didn't beat an expert at optimizing interpreter loops, it was because they didn't file a bug and give us code to optimize.There have been entire projects about using compilers/jits to supplant hand-written interpreter loops.Here's one: <a href="https://code.google.com/p/unladen-swallow/wiki/ProjectPlan" rel="nofollow">https://code.google.com/p/unladen-swallow/wiki/ProjectPlan</a>While the project was abandoned for other reasons, it produced 25+% speedups over the hand written interpreter versions of the same loop by doing nothing but using compilers.Wake me up when this stops happening ....He then goes on to make further assertions misundertanding compiler authors and what they do:"A compiler will not change an implementation of bubble sort to use mergesort. ... they only take responsibility for machine-specific optimization”.This is so false i don't know where to begin. Compilers would, if they could, happily change algorithms, and plenty do. They change the time bounds of algorithms. They do in fact, replace sorts. Past that, the problem there is not compilers, but the semantics of languages often do not allow them to safely do it.But that is usually a programming language limitation, and not a "compilers don't do this" problem.For example, the user may be able to change the numerical stability of an algorithm, but the programming language may not allow the compiler to do so.Additionally, it's also generally not friendly to users.As an example: ICC will happily replace your code with Intel performance primitives where it can. It knows how to do so. These are significant algorithm changes.But because users by and large don't want the output of ICC to depend on Intel's Math Kernel Library or anything similar, they don't usually turn it on on by default.GCC doesn't perform quite as much here, because even things like replacing "printf" with "puts" has caused tremendous amounts of annoyed users. Imagine the complaints if it started replacing algorithms.Past that, i'd simply suggest he hasn't looked far enough into the history of optimizing compilers, because there has been tons of work done on this. There are plenty of high level language optimizers that have been built that will completely rewrite or replace your code with rewritten algorithms, etc.I stopped reading at page 50.

评论 #9397437 未加载

评论 #9397529 未加载

评论 #9397374 未加载

评论 #9397340 未加载

评论 #9397447 未加载

评论 #9397283 未加载

评论 #9397186 未加载

pron大约 10 年前

His vision:The time is clearly ripe for program-manipulation systems... The programmer using such a system will write his beautifully-structured, but possibly inefficient, program P; then he will interactively specify transformations that make it efficient.But what if the answers the programmer gives the compiler turn out not to match reality, and some weird bug is introduced that has no representation in the source? The compiler's decisions need to be somehow spelled out in debuggable form.There is another approach (that can also be complementary). The programmer specifies in advance various specific scenarios, and a JIT compiler guesses which of the scenarios is in effect and optimizes for that (e.g. a certain condition, like input size is always true), but adds a guard (that hopefully adds negligible overhead). If the scenario does not match reality, the JIT deoptimizes and tries another. This process itself, of course, adds overhead, but it's warmup overhead, but it's more robust. This is the approach being tried by Graal, HotSpot's experimental next-gen JIT (and Truffle, Graals complementary a programming language construction toolkit aimed at optimization): <a href="https://wiki.openjdk.java.net/display/Graal/Publications+and+Presentations" rel="nofollow">https://wiki.openjdk.java.net/display/Graal/Publications+and...</a>

acqq大约 10 年前

Even if DJB wrote some very effective code, now when he "goes meta" he comes somehow in the strange area of being "not even wrong." Or maybe we miss his ideas when we read the slides instead of hearing him at the talk.People who make compilers used in the production know: if the naive users claim that "optimizing compilers don't matter" it's because the optimizing compilers are so good in doing what they're doing.There's the argument which is here buried deep in the discussions and which I think DJB missed to address, nicely stated by haberman:"If you want to argue that optimizing compilers are dead, you'd have to show that you can remove optimizing compilers from your toolchain, and have nobody notice the difference."

zurn大约 10 年前

Compiler optimization is obviously a sleeping field currently. There's nothing that's made it into practical compilers to address the bottlenecks shifting from ALU work to data storage and layout considerations.Consider all the gnashing of teeth and wringing of hands that goes on in C++ circles about inefficient data layout & representation by inferior programmers, and the stories of victorious manual data layout refactorings by performance heroes.DJB's slides don't address the data side because he only does crypto, and that's one of the fields where the ALU twiddling is still relevant. But crypto is also rarely a bottleneck.

spiritplumber大约 10 年前

So, Mel Kaye got the last word in?<a href="http://en.wikipedia.org/wiki/The_Story_of_Mel" rel="nofollow">http://en.wikipedia.org/wiki/The_Story_of_Mel</a>

nitwit005大约 10 年前

Interacting with a compiler sounds horrible. Think of the questions it might need to ask: "Hey! I could use AVX2 instructions here, after I inline a bunch of stuff and eliminate some dead code, but it requires doing a bunch of memory copying. Is that a good idea?". How would you answer a question like that?And then, since optimizations are target dependent, you would need to go through this exercise for each target. Sounds fun.

评论 #9397740 未加载

评论 #9397721 未加载

rurban大约 10 年前

So just for a start: Which optimizing compiler actually properly solves the optimization problems? I know of none.When I look at the list of optimization solvers for constrained linear or non-linear programming models (i.e. <a href="https://en.wikipedia.org/wiki/List_of_optimization_software" rel="nofollow">https://en.wikipedia.org/wiki/List_of_optimization_software</a>) and the list of compilers the intersection is still zero.All optimizers are still using their own ad-hoc versions of oversimplified solvers, which never fully explore their problem space. Current optimizing compilers are still just toys without a real-world solver. And it's clear why so. Current optimizable languages are still just toys without a real world solvable optimization goal.You can think of strictly typed, sound declarative languages where solvers would make sense, or you can think of better languages, like fortress or functional languages, which are not encumbered by not-properly optimizable side-effects and aliasing which harm most modern language designs.

评论 #9402048 未加载

phkahler大约 10 年前

I'd be happy if C and C++ had 2,3,and 4 element vectors as built in types, along with cross and dot product operations. There are intrinsics, and GCC has it's own intrinsics that can be used across architectures. But the languages need to have these. They are so fundamental to so many things.There are many more things to wish for, but I'm starting with one of the simplest.

评论 #9398693 未加载

anewhnaccount大约 10 年前

Here's a compiler which uses program synthesis to target a mesh network type architecture: <a href="http://pl.eecs.berkeley.edu/projects/chlorophyll/" rel="nofollow">http://pl.eecs.berkeley.edu/projects/chlorophyll/</a> . It uses a guided process like is implied in the last slides.

troydj大约 10 年前

The original abstract for the talk (which summarizes the slides) were posted by the author here:<a href="http://blog.cr.yp.to/20150314-optimizing.html" rel="nofollow">http://blog.cr.yp.to/20150314-optimizing.html</a>

carapace大约 10 年前

This is very good. I've been working towards something similar to what DJB is talking about. (Nice to know I'm in good company. :)In a nutshell, although automated systems will be good (are already and getting better) there will always be aspects that require humans in the loop (unless and until the machines actually become sentient, defined in this context as gaining that je ne sais quoi that humans do seem to have.)

copsarebastards大约 10 年前

The cases where optimizing compilers aren't good enough are where Java Hotspot Compiler and similar techniques really shines. Combined with novel superoptimization techniques, hotspot optimization could far outperform hand-optimization (although AFAIK that hasn't happened in practice yet).

评论 #9398583 未加载

jokoon大约 10 年前

Isn't that why people advocates C ? Isn't C just that type of language you can tell the compiler how to optimize ?C might not give very explicit information on how to optimize, but isn't it simple and bare enough to let the compiler do a better job ?

wolf550e大约 10 年前

audio of the talk: <a href="http://cr.yp.to/talks/2015.04.16/audio.ogg" rel="nofollow">http://cr.yp.to/talks/2015.04.16/audio.ogg</a>

raverbashing大约 10 年前

What a waste of time.Yes, specialists can squeeze the last performance improvements in ASM compared to C. Doesn't mean that -O2/-O3, auto-vectorizing can't do a nice job and get to 90% of thatOptimizing DO matter. Just compare -O0 and -O1. Really. It's not because CPUs are fast that people shouldn't do that and compilers shouldn't optimize a bare minimum.It's even better that the compiler do that because ASM optimizing by hand is very error prone.And compilers get better every day. Just look at LLVM.

faragon大约 10 年前

Room for optimizing compilers = distance between programming languages and CPU instructions/microarchitecture

asgard1024大约 10 年前

He is soooo spot on! This "dialogue with the compiler" is going to be really big in the next decades, but it's in no way a death of automatic optimization, it's just the beginning of it.Here's a simple example how I expect it to work: You write a code that uses a list-like data structure. The compiler then instruments the code and you run some tests. The tests will then be evaluated and the (post?)compiler selects what kind of data structure is to be used (let's make example simple, choices are array vs. list). For instance, if you need to look up elements a lot (based on the evidence from testing), an array will be chosen as the underlying data structure.And you actually get (if you want to see it, normally this information will be hidden) a little box in the IDE, where the variable is used, that tells you: "Here, an array will be used." You can then with one click say: "I don't want array, make it a list." So for all the possible optimizations, there are two viewpoints presented: The viewpoint of the compiler (based-on evidence from the tests or static analysis) and the viewpoint of the programmer (which allows for confirmation or override in case there are some unknown assumptions).And if the specifications change (say, we chose list earlier but now we have actually a lot of direct access to elements), you can just recompile the same code with the previously-agreed compiler choices removed! And without changing any line of code, a different and more fitting data structure will be used.You can easily see this can apply to many things, not just data structures. You can also see that different ways how the dialogue can be implemented are possible, once we syntactically separate the "what" from the "how" in the programming language. In the future, I believe, we will program just with abstract data types, and the concrete type specification will be selected based on the evidence from the running program (or static analysis augmented with that information). So the dialogue will not happen just with the programmer, but the compiler will also observe the real world behavior of the program and facilitate adaptation to that.In this way, it's even possible to input assumptions that doesn't have to be provably correct. This approach can potentially bridge the static vs dynamic types divide, and others.Finally, Haskell and functional languages are very nice, but I don't think they are final word in programming. If we wanted the above, they have syntactical problems such as mixing of concrete and abstract types (type classes). Also, there are limits to static analysis in the real world. The future will be lot more interesting.

jingo大约 10 年前

The birth of an optimizing assembler.

评论 #9402062 未加载

bcheung大约 10 年前

I would prefer death to the font-size: 2^1000px

MrPatan大约 10 年前

Trying to read this gave me cancer