C is simple to compile into nearly-optimal code, or at least it was back in the 1970s, when computers had single-opcode dispatch or trivial pipelines, no SIMD hardware, no other parallelism worth mentioning, and it wasn't worth worrying about cache too much. (Running in the registers was a neater trick.)<p>That meant it was relatively simple to 'see' the assembly language 'behind' a given C function or stretch of code; it didn't take much to get inside the head of a C compiler, so you could be reasonably sure that a simple piece of C would result in a similarly simple piece of assembly out the other end.<p>That, of course, was well and good when it was reasonably simple to predict actual performance from glancing at assembly code, which assumes opcode performance (as opposed to, say, cache performance) dominates how fast the code runs.<p>Now... how many of those things still hold true on desktop and server class hardware?