TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

How to transfer large amounts of data via network

58 pointsby phelmover 10 years ago

7 comments

moeover 10 years ago
Having transferred petabytes of data in tens of millions of files over the past months let me assure you there&#x27;s only one tool that you really need: GNU parallel.<p>Whether you copy the individual files with ftp, scp or rsync is largely irrelevant. The network is always your ultimate bottleneck. Using a slower copy-tool just means having to set a slightly higher concurrency in order to max it out.
评论 #9006279 未加载
评论 #9010496 未加载
bwrossover 10 years ago
The primary advantage GridFTP has over simply using tar+netcat for performance is that GridFTP can multiplex transfers over multiple TCP connections. This is helpful as long as the endpoint systems limit the per-connection buffer size to some value less than the bandwidth-delay product (BDP) between them. If you&#x27;ve got to bug sysadmins to get GridFTP set up for you on both endpoints, you might as well just ask them to increase the maximum TCP buffer size to match the BDP.<p>EDIT: Sorry, &quot;multiplex&quot; is not the right word to describe that. It&#x27;s more like GridFTP &quot;stripes&quot; files across multiple connections; it divides the file into chunks, sends the chunks over parallel connections, and reassembles the file at the destination.
jefuriiover 10 years ago
Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.
评论 #9006299 未加载
评论 #9006600 未加载
评论 #9006301 未加载
评论 #9007004 未加载
rdtscover 10 years ago
I like the tar+netcat mentioned towards the bottom for LAN transfer. That usually goes much faster than rsync or scp.<p>The reason haven&#x27;t looked at other tools is because I am doing this intermittently and always reach for the tool already installed on the system.
joshAgover 10 years ago
If you have to regularly transfer large amounts of data over a network, it might be worth looking into a wan optimization product like Riverbed&#x27;s Steelhead, Silverpeak&#x27;s VX&#x2F;NX lines, or Bluecoat Mach 5, or one of the other vendors&#x27; solutions.<p>Yeah, you could try and roll it yourself, since really it just comes down to compressing and deduplicating what you send over the wire, but doing that well and also making it simple to use is not a trivial problem. Why reinvent the wheel badly?
评论 #9005707 未加载
noedigover 10 years ago
This is a good site to visit if you have these kinds of data transfer issues: <a href="http://fasterdata.es.net" rel="nofollow">http:&#x2F;&#x2F;fasterdata.es.net</a>
mschuster91over 10 years ago
And once you involve Windows, especially with the mentioned &quot;ZOT files&quot;, Samba becomes a massive bottleneck...