TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Feeding data to 1000 CPUs – comparison of S3, Google, Azure storage

216 pointsby ranrubover 9 years ago

13 comments

oavdeevover 9 years ago
Stock Ubuntu needs SR-IOV driver to get to the actual bandwidth limit on ec2, it makes a lot of difference. We routinely get to ~2 Gbps down from S3 with that setup (using largest instance types).<p>edit: Gbps not GBps
评论 #10847311 未加载
评论 #10847299 未加载
lowbloodsugarover 9 years ago
If you are pulling large files from S3 we have found that they can be sped up by requesting multiple ranges simultaneously. It is easy to hit 5Gb&#x2F;s or 10Gb&#x2F;s on instances with the necessary bandwidth, accessing a single file, or multiple files. We have not encountered a limit on S3 itself. YMMV.
评论 #10847167 未加载
jedbergover 9 years ago
AWS has a limit on the total throughput any one <i>account</i> can have to S3, so the more CPUs OP adds, the worse OPs performance will be on each one. I suspect the other providers have the same restriction.<p>I either missed it or OP didn&#x27;t specify how many instances they was using at once to run their benchmark, but the more instances they used, the worse it will be per node.<p>This did not seem to be accounted for.<p>EDIT: OP says below it was from one instance, so what I said doesn&#x27;t apply to this writeup.
评论 #10846529 未加载
评论 #10846514 未加载
评论 #10846467 未加载
评论 #10846602 未加载
ChuckMcMover 9 years ago
When I see things like &quot;data set size 150GB&quot; and &quot;1000 CPUS&quot; I just naturally assume they are all in memory and never come from disk :-)
评论 #10846743 未加载
评论 #10846684 未加载
ranrubover 9 years ago
with kernel tuning, S3 performance improves (and will probably improve on GC&#x2F;Azure as well). Also, author uses Ubuntu 14.4 (see <a href="https:&#x2F;&#x2F;twitter.com&#x2F;Zbjorn&#x2F;status&#x2F;684492084422688768" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;Zbjorn&#x2F;status&#x2F;684492084422688768</a>), which doesn&#x27;t use AWS &quot;Enhanced networking&quot; by default. Would be interesting to see results for tuned systems.
skywhopperover 9 years ago
Very interesting comparison, glad to see it. I don&#x27;t have a comment on the content itself but I do have a note on the presentation.<p>The colors used for S3 and Azure Storage in the graphs are very near indistiguishable to me, as I have moderate red-green colorblindness. It&#x27;s easier to tell apart on the bar graphs, since the patches of color are much larger, although I still have to work at it, and use the hints of the labels, but on the line graphs, it&#x27;s basically impossible to tell apart. A darker shade of green would solve the problem for me personally, but I&#x27;m not all that bad a case, nor an expert on the best shades to pick for general color-blindness accessibility.<p>Just something to think about when presenting data like this.
评论 #10850628 未加载
jen20over 9 years ago
Has the author (if they are reading here) considered using Joyent&#x27;s Manta to take the processing to the data instead?
评论 #10847618 未加载
评论 #10847747 未加载
评论 #10847866 未加载
评论 #10847594 未加载
rmcphersonover 9 years ago
In S3 tests on c3.8xlarge instances, I&#x27;ve seen 8 Gbps throughput on both uploads and downloads using parallelized requests. Testing with iperf between two of the same instances maxed out about 8 Gbps as well so the throughput limitation is likely EC2 networking rather than S3.<p>These tests were done over a year ago so bandwidth limitations on EC2 may have changed since.<p>This testing was with <a href="https:&#x2F;&#x2F;github.com&#x2F;rlmcpherson&#x2F;s3gof3r" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;rlmcpherson&#x2F;s3gof3r</a>
评论 #10847389 未加载
imperialdriveover 9 years ago
Thanks for sharing your research - I&#x27;ve been up to the neck in EC2 migrations and trying to benchmark as I go... S3 is the neck chunk of work. Rock on!
hrezover 9 years ago
What missing from description is network setup. Is it ec2 classic, VPC? Is ec2 getting to s3 through IG? Hopefully not through NAT. There is also VPC endpoint to s3. Which all may have different performance profiles especially with multiple instances.
评论 #10847585 未加载
dwelch2344over 9 years ago
I&#x27;d be interested to see how AWS&#x27; Elastic File System (EFS) compares (though I&#x27;d imagine it&#x27;s not great, given it&#x27;s mounted via NFS)
评论 #10846724 未加载
评论 #10846535 未加载
frikover 9 years ago
How reliable is Azure? For example the story of Gitlab on Azure was a disaster: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=10781263" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=10781263</a> Something like that wouldn&#x27;t happen on AWS, GC, Softlayer, etc.
qaqover 9 years ago
WTF would one deploy such thing in the cloud?
评论 #10848344 未加载
评论 #10848895 未加载