I've been experimenting with profile-guided optimization and link-time optimization now on a variety of applications I've been developing.<p>A couple of things I've noticed:<p>1. PGO most benefits whenever you have branching logic in 'hot' code (either inside a tight loop or as part of a dispatch function that's repeatedly called). If set up correctly, the conditional can be reordered or the code structured so that you get very good branch predictions. This often means that code that wasn't automatically vectorized before can now be reorganized to employ SIMD instructions (usually this means a worse delay in case of a branch miss though).<p>2. Your dependent functions must be in the same compilation unit if you want to take advantage of many of the optimizations. Yes, interprocedural optimization (LTO) is a thing, but it's not perfect. If you have a loop that calls a function that can be inlined, the compiler does a great job with PGO ensuring that everything is hot in the instruction cache. If you put those functions in another compilation unit, not so much.<p>3. If you want to use PGO, you better use extremely representative inputs. The performance of a project compiled with PGO will suffer greatly if you use unrepresentative inputs.