I've been experimenting with profile-guided optimization and link-time optimization now on a variety of applications I've been developing.<p>A couple of things I've noticed:<p>1. PGO most benefits whenever you have branching logic in 'hot' code (either inside a tight loop or as part of a dispatch function that's repeatedly called). If set up correctly, the conditional can be reordered or the code structured so that you get very good branch predictions. This often means that code that wasn't automatically vectorized before can now be reorganized to employ SIMD instructions (usually this means a worse delay in case of a branch miss though).<p>2. Your dependent functions must be in the same compilation unit if you want to take advantage of many of the optimizations. Yes, interprocedural optimization (LTO) is a thing, but it's not perfect. If you have a loop that calls a function that can be inlined, the compiler does a great job with PGO ensuring that everything is hot in the instruction cache. If you put those functions in another compilation unit, not so much.<p>3. If you want to use PGO, you better use extremely representative inputs. The performance of a project compiled with PGO will suffer greatly if you use unrepresentative inputs.
this twitter thread makes the post somewhat more interesting: <a href="https://twitter.com/BruceDawson0xB/status/793177917949739008" rel="nofollow">https://twitter.com/BruceDawson0xB/status/793177917949739008</a><p>Specifically, (one reason) why this wasn't done years ago[1] and bugs found when switched to PGO[2]<p>[1] <a href="https://connect.microsoft.com/VisualStudio/feedback/details/1064219/ltcg-linking-of-chromes-pdf-dll-spends-60-of-time-in-c2-dll-ssrfree" rel="nofollow">https://connect.microsoft.com/VisualStudio/feedback/details/...</a><p>[2] <a href="https://randomascii.wordpress.com/2016/03/24/compiler-bugs-found-when-porting-chromium-to-vc-2015/" rel="nofollow">https://randomascii.wordpress.com/2016/03/24/compiler-bugs-f...</a>
The article is a bit shallow. It would be nice to see:<p>1. What flavours of PGO optimizations were applied? What was the isolated impact of each one of them on both speed and size of the code?<p>2. What tests did they use to "guide" PGO?<p>3. How did they analyze PGO results(except for these three tests that were provided)? I assume they did not blindly trust it, therefore there should be a way of visualizing differences between two binaries with millions of lines of code.<p>4. How did PGO affect crash statistics?
In case anyone was wondering about other browsers - Firefox has been doing PGO, and dealing with bugs - for quite some time: <a href="https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Build_Instructions/Building_with_Profile-Guided_Optimization" rel="nofollow">https://developer.mozilla.org/en-US/docs/Mozilla/Developer_g...</a>
Sort of unrelated but one more example of responsive design gone wrong. The chart overflows outside of the page on an iphone, and by locking the scaling, I can't zoom out to view it.<p>Another example of a page that would be more readable had it sticked to plain old html.