I think the article is missing one big reason why we care about 99.99% or 99.9% latency metrics and that is that we can have high latency spikes even with low utilization.<p>The majority of computer systems do not deal with high utilization. As has been pointed out many times, computers are really fast these days, and many businesses may be able to get away through their entire lifetime on a single machine if the underlying software makes efficient use of the hardware resources. And yet even with low utilization, we still have occasional high latency that still occurs often enough to frustrate a user. Why is that? Because a lot of software these days is based on a design that intersperses low-latency operations with occasional high-latency ones. This shows up everywhere: garbage collection, disk and memory fragmentation, growable arrays, eventual consistency, soft deletions followed by actual hard deletions, etc.<p>What this article is advocating for is essentially an amortized analysis of throughput and latency, in which case you do have a nice and steady relationship between utilization and latency. But in a system which may never come close to full utilization of its underlying hardware resources (which is a large fraction of software running on modern hardware), this amortized analysis is not very valuable because even with very low utilization we can still have very different latency distributions due to the aforementioned software design and what tweaks you make to that.<p>This is why many software systems don't care about the median latency or the average latency, but care about the 99 or 99.9 percentile latency: there is a utilization-independent component to the statistical distribution of your latency over time and for those many software systems which have low utilization of hardware resources that is the main determinant of your overall latency profile, not utilization.