LLVM doesn't actually prefer the lower registers when doing register allocation. IIRC, sunfish told me that GCC doesn't either. It would be interesting to add features that try to minimize code size to the register allocator, but no compiler I know of actually does this.<p>Partially as a consequence of this, the REX prefixes take up a lot of space in most x86-64 instruction streams. In fact, the average size of each instruction is almost exactly 4 bytes, exactly the same as in classic 32-bit RISC architectures. (This is why I dislike it when people link to that old Linus post about how x86 is better than RISC architectures because of code size; it may have been true then, but not now.)
N.B. the "Volatile?" column is specific to the <i>Windows</i> calling conventions. Under the Sys V calling conventions (i.e. what the world outside of Redmond uses), RDI and RSI are volatile (and used for passing the first two integer arguments).
The OP perpetuates the mistaken assumption that x86-64 looks as it does because it extends good ol' 32-bit x86 encodings, which one might assume still work so well that one could run 32-bit code in 64-bit mode and have it still work.<p>Which is not the case at all. Those REX prefix bytes used to be perfectly good 32-bit x86 instructions that now simply don't work in 64-bit mode with their original encodings. So the "compatibility" between 32-bit and 64-bit modes is mythical -- the Opteron could have had a nice shiny new 64-bit programming model that was far less confusing than the dog's breakfast that is x86-64, but just didn't.
The REX prefix: <a href="http://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix" rel="nofollow">http://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix</a>
I had a question about this statement in the artcile:<p>"The C calling convention on x86 systems specifies that callees need to save certain registers."<p>Practically speaking is this the prologue that that C run time - crt0.o provides automatically/implicitly?