TechEcho

13 comments

hhwalmost 11 years ago

The problem with benchmarks are that it's really, really difficult to emulate real-world conditions. However, here some of the more obvious points that are unrealistic.1) Comparing between same number of cores. The core count selected for each testing level is completely arbitrary. With both web and database servers, which scale well to increasing core count, single-threaded performance is generally less of a concern and should not be a point of measure aside from average page load time. Some server configurations are optimized for higher numbers of slower cores, while others are optimized for fewer but faster cores. By comparing like core counts, this testing is highly skewed to the latter.Comparing packages at the same price point, or just how the package fits into the product offering lineup (smallest, median, largest instance) would be much more fair to compare. If 4 cores at one provider costs the same as 1 core at another, it should be fair to compare the two at different core counts.2) Server configurations. For both web and database servers, the best performance optimization that can be done is to cache to RAM as much as possible. With increased caching, the need for disk I/O also goes down significantly, and can easily be by as much as an order of magnitude. Serving static content uses minimal resources and is mostly dependent on network performance. Dynamic content is more CPU intensive, and most of the time you can and should be caching the compiled opcode/bytecode. Most website database usage is read heavy, and many of the queries can be cached as well. The one drawback to a heavy emphasis on caching is that if the server restarts, there may not be enough resources to service all requests while warming up the cache. However, given that dynamic loads is precisely what cloud offerings are supposed to excel at, you can spin up additional instances at these times, or just take a horizontally scaled approach to begin with so that a single instance failing will not have a major impact on your aggregate load.3) Synthetic benchmarks, by their very nature, do a poor job of emulating real world performance. The best way to benchmark both web server and database is to take a real site, log all the requests, and replay the logs. What you want to measure for is the maximum number of requests or queries that can be served, average time and standard deviation at different requests/query rates, etc.4) Network speed tests. The biggest mistake that most tests make is that they measure performance from content network to content network, rather than from content network to eyeball network. Especially with the current peering issues going on between carriers and eyeballs, this is more important than ever. This is a very difficult problem to solve however, as it's not easy to do throughput tests from a large number of different eyeball networks. You would have to take a very large number of client generated results, and compare differences for all the different providers in all their different locations, which would be nearly impossible. The next best thing, while still a lot of work but more feasible, is to collect up IP's for eyeball networks for as many different locations as possible, but perhaps just the top X number of cities by population, and run continuous pings/traceroutes over an extended period of time. You can then just use average latency, standard deviation, and packet loss % as the metrics.

评论 #8009581 未加载

jdubalmost 11 years ago

This is terrific work. I agree with other posters that the graphs and supporting information could be improved, but underneath the presentation of the results, you've done a VERY good job avoiding the pitfalls most comparisons suffer. As this was mentioned in your intro and conclusion, it was clearly one of your goals. Nailed it. :-)

traekalmost 11 years ago

I found the results very interesting, but I'm not a big fan of the charts. The x-axis is categorical, not continuous, so a line graph isn't appropriate here.

评论 #8006602 未加载

davidjgraphalmost 11 years ago

My colour blindness is relatively mild, but I struggled with many of the charts. I'd suggest putting the labels on the actual lines, either in the chart area or one side or other next to the line termination.It sounds trivial, but with 20+ charts that I had to stick my face up to the screen for, I gave up after about 5.Regarding "Due to SPEC rules governing test repeatability (which cannot be guaranteed in virtualized, multi-tenant cloud environments), the results below should be taken as estimates.". I'd like to have seen some attempt to migate this with, say, some kind of averaging of x tests over different instances. Although, I understand this would have extended an already long test process.

评论 #8005358 未加载

评论 #8005551 未加载

mikektralmost 11 years ago

DigitalOcean got burned pretty badly in the network performance and latency benchmark, not to mention disk I/O. DO also came in last for availability. Amazon seems to come out on top in most categories (including price/performance with T2) except a few cases where Rackspace did really well for database I/O and Softlayer doing really well in large database random read/write throughput.

walterbellalmost 11 years ago

High-quality content. You should repeat the download link at the bottom of the page, after the reader has been impressed by the data and may want more, but will definitely have forgotten this:"This post is essentially a summary of a report we've published in tandem. The report covers the same topics, but in more detail. You can download the report for free."

评论 #8005539 未加载

solarwind4almost 11 years ago

Very detailed results. It's interesting how poorly digital ocean did for consistency of IO. Amazon, Rackspace and Softlayer seemed to fare best in many categories ahead of Azure, GCE, digital ocean etc. GCE database IO performance seemed particularly poor. Amazon seems to be far ahead of the rest for internal network throughput and latency.

评论 #8005458 未加载

asbalmost 11 years ago

It's a shame Linode weren't included in this now they too have per-hour pricing.

评论 #8006977 未加载

评论 #8006944 未加载

pinheadalmost 11 years ago

I was disappointed they didn't include internal network latency variability like they did with disk performance. I've seen EC2 have wildly different network latencies at times, but haven't tried any of the other services.

评论 #8006894 未加载

corditealmost 11 years ago

I am glad that the colors were consistent between the graphs so I spent less time referring to a legend.If reports like this become regular (say, a monthly occurrence), would it be possible or feasible for the cloud providers to try to game (or optimize for) certain qualities?

评论 #8005929 未加载

brendangreggalmost 11 years ago

What observability tools were used to confirm that the target of the test was actually being tested properly?I've performed, and also debugged, a lot of these cloud comparison benchmarks, and it's very, very easy to have bogus results due to some misconfiguration.

评论 #8005790 未加载

CMCDragonkaialmost 11 years ago

There's also a thing called ServerBear, its like server benchmark ratings but ran by users.

gidgreenalmost 11 years ago

See also the continuously updated benchmarks at cloudlook.com

13 comments

hhwalmost 11 years ago

评论 #8009581 未加载

jdubalmost 11 years ago

traekalmost 11 years ago

I found the results very interesting, but I'm not a big fan of the charts. The x-axis is categorical, not continuous, so a line graph isn't appropriate here.

评论 #8006602 未加载

davidjgraphalmost 11 years ago

评论 #8005358 未加载

评论 #8005551 未加载

mikektralmost 11 years ago

walterbellalmost 11 years ago

评论 #8005539 未加载

solarwind4almost 11 years ago

评论 #8005458 未加载

asbalmost 11 years ago

It's a shame Linode weren't included in this now they too have per-hour pricing.

评论 #8006977 未加载

评论 #8006944 未加载

pinheadalmost 11 years ago

评论 #8006894 未加载

corditealmost 11 years ago

评论 #8005929 未加载

brendangreggalmost 11 years ago

评论 #8005790 未加载

CMCDragonkaialmost 11 years ago

There's also a thing called ServerBear, its like server benchmark ratings but ran by users.

gidgreenalmost 11 years ago

See also the continuously updated benchmarks at cloudlook.com

Comparing Cloud Compute Services

13 comments

Comparing Cloud Compute Services

13 comments