Doesn't this assume a single-threaded application? The example of a clerk's service time and people waiting in line is oversimplified. Modern systems have maybe 100 clerks per store, and many stores; how do you perform a "Ctrl+Z" test in this case? Even if you had a perfectly divided line of people waiting at each cashier in each store (machine), the worst case would be experienced people in line for the store or clerk with a reduced service time. Thus, for accuracy, you would need to measure queue depth at the maximum latency per thread (clerk) and add that latency to each subsequent request until you serve the number of reuqests in your queue. This kind of math requires constant sampling that would slow down any system so dramatically it would defeat the purpose. I think this becomes even more clear when you consider that most such systems have load balancing strategies that further mitigate queue depths such that they are intentionally distributed based on which backend services have the lowest historical latencies (and yes, I realize these algorithms are likely plagued by the same "omission conspiracy" mentioned -- but they certainly don't uniformly distribute requests).<p>In summary, let's focus on the max latency, home in on which backend exhibited said latency, identify the depth of the queue at the time that latency was experienced, and use that information to model the impact to users. From this, I expect you can draw some meaningful percentiles in terms of latency distributions, and without having to measure more data points than feasible without decreasing latency further.<p>Am I misunderstanding something? I'm no math whiz, this is mostly intuition.