I'm an application developer with decent "high-level" performance tuning skills. I can profile my application code and fix bottlenecks but eventually I hit a wall. Once I've addressed the low hanging fruit I know there is probably still 10-100x+ performance improvements available but out of reach to me with my current skills.<p>I don't know how to find and fix things like: excessive page faults, L1/L2 cache misses, branch mispredicts, context switches etc. What you might call "mechanical sympathy."<p>For those with these skills, how did you learn? How would you recommend someone develop this skillset today?
This is a great blog to give you things to get started. <a href="https://easyperf.net/" rel="nofollow noreferrer">https://easyperf.net/</a><p>As with all things, practice is an essential part of improving!<p>Then, there's learning from some real achievements. Fast inverse square root, or the 55GB/s Fizzbuzz example: <a href="https://codegolf.stackexchange.com/questions/215216/high-throughput-fizz-buzz/236630#236630" rel="nofollow noreferrer">https://codegolf.stackexchange.com/questions/215216/high-thr...</a>