Early in my career, I reached out to the author and was able to grab lunch with him; he was about to retire! Was insightful to hear his thoughts on system performance, particularly systems involving more than one machine, which is something they studied deeply.<p>It gave me appreciation for the amount of knowledge one accumulates over a career, and what a loss it is to an organization when one so knowledgeable retires.
I had the pleasure to work with Dick on getting KUtrace to work on Android devices last year. It was a great experience to work with one of the greats in systems performance. He was a wealth of information regarding performance bottlenecks and optimizations.<p>KUtrace is absolutely one of the most powerful tools I've used for deeply understanding performance bottlenecks (after isolating issues) such as poor scheduling behavior. I would highly recommend reading his book "Understanding Software Dynamics" [1] if you are interested in learning more about KUtrace or performance bottlenecks/optimizations in general. The book is quite dense and dives deep into the performance characteristics of many examples of the five fundamental resources (according to Dick): CPU, Memory, Disk/SSD, Network, and Software critical sections.<p>[1]: <a href="https://www.oreilly.com/library/view/understanding-software-dynamics/9780137589692/" rel="nofollow">https://www.oreilly.com/library/view/understanding-software-...</a>
Sounds very interesting.<p>But it works by patching the kernel, not just using eBPF like many performance tools recently. So it needs active maintenance all the time considering the current velocity of internal kernel changes. And I would not be surprised if it didn't build or work correctly if you have a heavily patched and customized kernel.<p>On the positive side at a first glimpse the maintenance to adapt to new kernels looks very active.
Out of curious, does BPF now capable of capturing all the context switch events such as CPU trap?<p>Also, if the overhead is negligible, maybe the author can try to merge this into mainline with the use of static key to make the incurred overhead switchable. In spite of the static key, the degree of the accompanied inteferences on cache and branch predictor might be an intriguing topic though.