A few things about the article that made me think "hmmmm":<p>No mention of testing set-up. Was the test client running on a different machine from the server? What kind of machines? What kind of network?<p>Many of the charts have the same "ballooning" shape, despite measuring very different systems. I think this is due to the "attempt to correct coordinated omission by filling in additional samples". As I understand it, all charts but the first have this correction applies (and it does sound like it is applied by manipulating the data, not by altering the measuring method). To understand the effect this might have, imagine testing a system that has a single request queue by making requests on a regular schedule, say at 1ms intervals. And most of the time, these take much less than 1ms. But one request is an outlier and takes 100ms. What will the "corrected" results look like? The worst case will by 100ms. The second worst case will be 99ms. The third worst case will be 98ms, etc. On a linear horizontal scale, this would give us a linear slope at the right hand side of the chart. Change to a logarithmic horizontal scale, and you get a chart with the shape seen in many of the charts in this article. This makes it impossible to tell whether the worst cases are due to a small number of outliers or not. I believe that the correction is well-meaning, but I think the uncorrected results would be more informative.<p>The use of line charts is a bit odd. They are connected to the origin, which is obviously a fiction. They are also slightly smoothed - where steps are visible, the steps have a gradient rather than being a vertical line. Where the number of data points is low, this leads to odd effects: In the two 1MB charts, the right third of the chart is just showing the value of a single data point! A scatter plot might give the reader a more honest impression.<p>The logarithmic horizontal scale of those charts tends to focus attention on the worst cases. That's not unreasonable - in some contexts, that's what you really care about. But outliers might occur due to environmental effects like kernel scheduling, VM scheduling, dropped packets on a noisy network etc., unless you make an effort to prevent such things. And it makes it very hard to see the typical values on the charts for RabbitMQ and Kafka where the range of Y values is large. Can you tell what the median latency for RabbitMQ/Kafka for any message size is? It looks like about 0.5ms to me, but it's hard to read it from any of the charts.<p>The number of messages involved is different for different message sizes. You can see that from the way the 1MB charts are stepped, but the charts for smaller message sizes are smoothed. For 1MB messages, it look like there are 5k or 10k samples on the charts. For the smaller message sizes, probably far more. Were all the tests run for roughly the same amount of time? Tests run for longer might see more outliers due to the environment.<p>"The 1KB, 20,000 requests/sec run uses 25 concurrent connections". With the implication that other test runs had different levels of concurrency. So what were they? What was the impact of changing the concurrency levels while the message size/rate was constant?<p>Is it possible that the client program making the measurements was introducing any artefacts (for example, being written in Go, did it encounter any GC pauses?). It would be interesting to see the the results of measurements against a simple TCP echo server, as a control.<p>My criticisms may seem too harsh. It is too much to expect someone to expect weeks doing rigorous measurements, and the resulting article would be so long that hardly anyone would read all of it (sounds like academia!). Someone might say that I should do my own experiments if I think I can do them better; but I have a day job too. I don't want to discourage the author; I think it is good that the author did the work he did, and put it up for everyone to see. But when articles like this get linked on HN and read by lots of people, they can easily get regarded as conclusive. Ideas about the performance of various projects get established that might not be well-founded and can take years to dispel. So all I'm saying is, reader beware!