How do modern compilers choose which variables to put in registers?

400 点作者 azeemba3 个月前

10 条评论

This is perhaps my favorite Stack Overflow answer of all time. I don't remember when I last saw such an approachable explanation of something so critical yet complicated.> One of the canonical approaches, graph coloring, was first proposed in 1981.This is about as far as my professor took this topic in class ~13 years ago. Nevertheless the slides that he used to illustrate how the graph coloring problem applied to register allocation stick with me to this day as one of the most elegant applications of CS I've ever seen (which are admittedly few as I'm not nearly as studied as I ought to be).> Code generation is a surprisingly challenging and underappreciated aspect of compiler implementation, and quite a lot can happen under the hood even after a compiler’s IR optimization pipeline has finished. Register allocation is one of those things, and like many compilers topics, entire books could be written on it alone.Our class final project targeted a register-based VM [1] for this exact reason. I also found writing MIPS assembly simpler than Y86 [2] because of the larger number of registers at my disposal (among the other obvious differences).[1] <a href="https://www.lua.org/doc/jucs05.pdf" rel="nofollow">https://www.lua.org/doc/jucs05.pdf</a>[2] <a href="https://esolangs.org/wiki/Y86" rel="nofollow">https://esolangs.org/wiki/Y86</a>

评论 #43077699 未加载

评论 #43076008 未加载

评论 #43076546 未加载

评论 #43082465 未加载

评论 #43076550 未加载

jrimbault3 个月前

It doesn't surprise me much that this was written by the same author as "Parse, don't validate", very well written[0]: <a href="https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/" rel="nofollow">https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...</a>[1]: all previous threads <a href="https://news.ycombinator.com/from?site=lexi-lambda.github.io">https://news.ycombinator.com/from?site=lexi-lambda.github.io</a>

评论 #43076644 未加载

WalterBright3 个月前

The Digital Mars D compiler register allocator:1. intermediate code is basic blocks connected with edges. Each block has a bit vector the size of which is the number of local variables. If a variable is referenced in a basic block, the corresponding bit is set.2. basic blocks are sorted in depth first order3. variables are sorted by "weight", which is incremented for each use, incremented by 10 for each use in a loop, by 100 for each use in a nested loop, etc.4. code is generated with nothing assigned to registers, but registers used are marked in a bit vector, one per basic block5. Now the register allocator allocates registers unused in a basic block to variables that are used in the basic block, in the order of the weights6. Assigning variables to registers often means less registers are used for code generation, so more registers become available, so the process is done again until no more registers can be assignedThere are more nuances, such as variables passed to a function via registers, which introduces complications - should it stay in a register, or be moved into memory? But dealing with that is why I get paid the Big Bucks.

评论 #43084402 未加载

kragen3 个月前

It's surprising that such a high-quality answer lacks a mention of the polynomial-time optimal register allocation algorithms in common use in current compilers. It's true that graph coloring is NP-complete, so you can't solve graph coloring in polynomial time, but graph coloring is register allocation with an additional constraint added: that you can never move a variable from one register to another.Removing that constraint turns out to allow guaranteed polynomial time solutions, and that result is now widely used in practice.

评论 #43099556 未加载

artemonster3 个月前

Can anyone with a good knowledge explain why aren't we also explicitly (with the help of compiler) managing cache as well? Register allocation is basically lowest form of cache on which you can operate on with most efficiency. Why on the next level we rely on our hardware to do the guesswork for us?

评论 #43077053 未加载

评论 #43077796 未加载

评论 #43084174 未加载

评论 #43079709 未加载

评论 #43076688 未加载

评论 #43085758 未加载

评论 #43081415 未加载

评论 #43076992 未加载

kibwen3 个月前

I'd be curious to see a "high-level structured assembly language" that gives the programmer actual control over things like register allocations, in a more systematic way than C's attempt. You might say "optimizers will do a better job", and you're right, but what I want to see is a language that isn't designed to fed into an optimizing backend at all, but turned into machine code via simple, local transformations that a programmer can reliably predict. In other words, as high-level systems languages like C lean more and more heavily on optimizing backends and move away from being "portable assembly", I think that opens up a conceptual space somewhere below C yet still above assembly.

评论 #43082958 未加载

评论 #43082732 未加载

评论 #43080715 未加载

评论 #43081617 未加载

评论 #43081953 未加载

userbinator3 个月前

I find it a little amusing that the given example function does a computation which could've easily been simplified into a single instruction on x86, which needs only a single register (assuming the input x and the return value are both in eax):<pre><code> lea eax, [eax+eax*4+7] </code></pre> ...and it's likely that a compiler would do such simplifications even before attempting register allocation, since it can only make the latter easier.This is particularly likely to be necessary if the desired instruction is already using indirect addressing, and both register allocation and instruction selection must take those constraints into account.As a long-time Asm programmer, I believe that instruction selection and register allocation are inseparable and really need to be considered at the same time; attempting to separate them, like what most if not all compilers do, results in (sometimes very) suboptimal results which is easily noticeable in compiler-generated vs human-generated code.

评论 #43076347 未加载

评论 #43076293 未加载

评论 #43078880 未加载

评论 #43076588 未加载

评论 #43076996 未加载

评论 #43076275 未加载

Aardwolf3 个月前

Perhaps this is layman's understanding, but afaik all CPU operations are only done on registers, so any variable has to be moved to registers when it's operated on (or are some operations possible on memory without going to registers?) and so the answer to "which variables" would be "all of them".Or is the question about which variables are kept in registers for a bit longer time even while they're not actively being computed on right now?

评论 #43079760 未加载

评论 #43080388 未加载

评论 #43080449 未加载

评论 #43079043 未加载

评论 #43082064 未加载

dapperdrake3 个月前

Registers vs x87 stack with free swap a.k.a. rotating stack. Sub-optimal edge cases for both exist:[1] <a href="http://cr.yp.to/qhasm/20050210-fxch.txt" rel="nofollow">http://cr.yp.to/qhasm/20050210-fxch.txt</a>[2] <a href="https://pvk.ca/Blog/2014/03/15/sbcl-the-ultimate-assembly-code-breadboard/" rel="nofollow">https://pvk.ca/Blog/2014/03/15/sbcl-the-ultimate-assembly-co...</a>

travisgriggs3 个月前

Reading the top voted answer made me feel like I was reading one if Jon Hannibal Stokes CPU praxis write ups from the early years of ARS. Anyone else remember those?