In the "test setup", it says: "a t3a.micro" and "a t4g.micro".<p>To me, this implies they used a single ec2 instance of each size. However, ec2 instance p99s or so can be impacted by "noisy neighbors", especially on the burstable types which are intentionally oversubscribed.<p>It's still useful to know if, for example, t4gs are more prone to noisy neighbors, but with only 1 instance as a datapoint, you simply can't tell if it was bad luck or not.<p>I think this test would be much better with either only dedicated instance types, or by running it with a large n such that an individual unlucky/noisy-neighbor doesn't influence the results overtly.
Aren't 't' instances burst instances? They need to be under constant load for a long time before their burst credits for CPU, memory, network and EBS run out, after which they fall back on their baseline performance.<p>> It does appear that the Arm-based instances can’t consistently maintain the same performance at high request rates.<p>I'm unwilling to trust that statement at face value for now given it's been tested against a 't' instance.<p>EDIT: Removed note about network burst credits in compute and memory optimized instances. I'm not sure if these instances have that.
Personal experience: We moved multiple PostgreSQL servers including a large one using 32 vCPUs to the equivalent ARM based instances, and the performance was about the same, but of course ARM instances are less expensive.
Given the title, I would have expected a price/perf comparison across multiple tiers of servers. Focusing on two random (but similar) low performance instances makes it hard to generalize.
A couple recommendations for your visualization:<p>1) More fine-grained bins to help show the shape of the distribution (are there performance cliffs?). Try using vertical lines to denote % cutoffs.<p>2) Given the wide range between your bins, a log scale might be a good idea instead of raw frequency.<p>3) Try some other method of visualization. I'm not sure a histogram is useful for what you're trying to convey, at least the way it's being used here.<p>As it stands, the visual information is so dominated by the 99.5% case that the plots don't help illustrate your tabular data.
Highly recommend ARM-based instances for RDS and ElastiCache in particular. That's a easy instance type switch and nearly idiot proof. Switching Kubernetes cluster worker nodes is another story (though ARM built container adoption is getting better).
We leverage Arm instances in Depot [0] to power native Docker image builds for Arm and I would say we see a lot of performance improvement in with machine start, requests per instance, and overall response rate. Granted, we aren't throwing the number of requests at our instances that this test is looking at. But, we are throwing multiple concurrent Docker image builds onto instances and generally speaking, they do great.<p>All of that to say, I think the t3/t4 instance used in this test is a bit problematic for getting a true idea of performance.<p>[0] - <a href="https://depot.dev/">https://depot.dev/</a>
The r6g Arm vCPUs we tried in our AWS Neptune performance testing always seemed to perform worse than the equivalent-in-price r5d.4xlarge we normally use. Unfortunately I didn't have time to really dig into what it was about our design/workload that caused the different results. I wish I could have dug deeper, especially since now there's more types of vCPU available than when we ran our tests: x2g, r6i, and x2iedn.
Right now the web framework of choice in Rust tends to be Axum. Also, no data on CPU utilization which can be different when targeting ARM.
You may also want to include .NET which has really good support for ARM64.<p>Also 2, t4g instances use Graviton 2 which has, relatively speaking, weaker cores. To get best experience you would need to compare versus Graviton 3 (but these are more expensive, but you can deploy to them in a denser manner).
The instances gathered for setup are absolutely the worst: t3a.micro and t4g.micro.<p>Such instances share the vcpus, and only get burst of dedicated cpus, then they'll get throttled for performance.<p>The author should have picked any of the other instances. The bare minimum to make an informed decision should be the c6g.medium or the c7g.medium.<p>In my experience btw, the c7g family really seems to be closing the gap in single-threaded performance with x86-based instances.