I rather suspected that compiler toolchains had this level of insight into processor architecture when doing instruction scheduling. Back in the 686 era, one could just about carry a model of the execution architecture in your head, and just maybe occasionally wring an extra 3 to 5% performance using hand-crafted assembler without having to bring in heavy-duty Intel profilers to search for instruction stalls. And, pleasantly, 686 architecture lasted for a long time, so it was actually a worthwhile investment to understand instruction scheduling on the 686.<p>Nowadays, compilers seem to produce code that beats my hand-crafted assembler pretty much every time. Sure, it's still possible to optimize C/C++ code so it makes better use of caches, or tweak code to allow the compiler to make better use of available registers. But now I know why compilers do better instruction scheduling than I can, especially with the proliferation of execution architectures. I can't imagine a human competing with that level of insight into instruction throughput anymore.<p>Thanks for the insight.
Re: instruction scheduling<p>“In this post, I went through the basics of LLVM’s scheduling model and show how to specify the scheduling information for individual instructions. On top of that, I explained different kinds of processor resource buffers and their use cases.”