Nice summary! Additional changes I have planned:<p>- Removing per-instruction timers, which add a measurable overhead even when disabled (<a href="https://github.com/llvm/llvm-project/pull/97046">https://github.com/llvm/llvm-project/pull/97046</a>)<p>- Splitting AsmPrinterHandler (used for unwind info) and DebugHandler (used also for per-instruction location information) to avoid two virtual function calls per instruction (<a href="https://github.com/llvm/llvm-project/pull/96785">https://github.com/llvm/llvm-project/pull/96785</a>)<p>- Remove several maps from ELFObjectWriter, including some std::map (changed locally, need to make PR)<p>- Faster section allocation, remove ELF "mergeable section info" hash maps (although this is called just ~40 times per object file, it is very measurable in JIT use cases when compiling many small objects) (planned)<p>- X86 encoding in general; this consumes quite some time and looks very inefficient -- having written my own x86 encoder, I'm confident that there's a lot of improvement potential. (not started)<p>Some takeaways on a higher level -- most of these aren't really surprising, but nonetheless are very frequent problems(/patterns) in the LLVM code base:<p>- Maps/hash maps/sets are quite expensive when used frequently, and sometimes can be easily avoided, e.g., with a vector or, for pointer keys, a pointer dereference<p>- Virtual functions(/abstraction) calls comes at a cost, especially when done frequently<p>- raw_svector_ostream is slow, because writes are virtual function calls and don't get inlined (I previously replaced raw_svector_ostream with a SmallVector&: <a href="https://reviews.llvm.org/D145792" rel="nofollow">https://reviews.llvm.org/D145792</a>)<p>- Frequent heap allocations are costly, especially with glibc's malloc<p>- Many small inefficiencies add up (=> many small improvements do, too)