Great site! I kind of have a predisposition to summarize linux performance, be it tuning or monitoring, taking a deep breath…<p>This is such a depth subject, with a long list of variety of observability tools. At minimum, make sure you know deeply uptime, dmesg, and iostat. These are your friends to give you a glimpse into various system aspects like load, memory, CPU, and more, enabling a diagnostic overview of system health. This is what I call, the “let me take a look at it” check list, 1st page of 100!<p>When emphasizing methodologies for performance analysis I recommend careful benchmarking to holistically evaluate system behavior and workload characteristics. with before and after scenarios. Make smaller changes first, then gradually compound what you think will provide benefits. Remember, labs and production never behave the same.<p>This is where it gets tricky, CPU profiling with tools like “perf” and visual aids like flame graphs enable targeted analysis of CPU activity, along with tracking hardware events to optimize computational efficiency. You need to know more than “it’s the app man, was fine until the latest release from development”<p>When you are the admin and speaking to a developer; Linux, tools like ftrace and BPF come into play, allowing for detailed tracking of kernel function execution and system calls, which can be vital in troubleshooting and performance optimization. You can also be the developer, varying the admin’s intuition… as the saying goes, trust but verify.<p>When it’s your code, then you better know BPF! It not only facilitates efficient in-kernel tracing but also propels the development of advanced custom profiling tools through bcc and bpftrace, offering deeper insights into system performance.<p>Last comment, it’s %$$% hard! Tuning means you need to navigate through adjusting a myriad of system components and kernel parameters, from CPUs and memory to network settings, aiming to optimize performance and reliability across various system workloads, else you can blame it on the network! :D<p>Really, you need to have a good behavioral attitude at change management, as chasing code or kernel parameters could be a daunting task that just overwhelms everyone in a moment where you might be time constrained and the preasure could lead to a higher degree of human errors.