TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How Rsync Works

117 点作者 thedookmaster超过 11 年前

12 条评论

temp45234超过 11 年前
An interesting alternative to rsync is zsync <a href="http://zsync.moria.org.uk/" rel="nofollow">http:&#x2F;&#x2F;zsync.moria.org.uk&#x2F;</a> . A very brief summary of differences:<p>* Instead of performing the sender portion of work of generating checksums on-demand, it is performed once when the file is &quot;published&quot; and saved in a zsync metadata file<p>* This zsync metadata file is fetched (simple copy) and the receiver uses it to decide which portions of the file it needs to request. It then requests only those portions.<p>* Because of the simplification, the protocol can be reduced to work over simple stateless http. Any HTTPD that supports range requests can be a zsync server. Remote zsync files are represented by http urls.<p>* Note, this all but removes the CPU requirement of the sender&#x2F;server.<p>I&#x27;ve used zsync in some very large systems to efficiently distribute write-few read-often files with only partial changes to many endpoints. Much more scalable than rsync due to the lack of CPU cost for the server&#x2F;sender.<p>I also maintain a fork of zsync which runs using libcurl rather than the original author&#x27;s custom http client code. This fork is primarily to support SSL: <a href="https://github.com/eam/zsync" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;eam&#x2F;zsync</a><p>It&#x27;s a cool project, check it out!
评论 #6507313 未加载
Ygor超过 11 年前
The main problem with the standard rsync utility is the protocol. Check out the Rsync Protocol section of this document:<p>&quot;A well-designed communications protocol has a number of characteristics.&quot;<p>&lt;list of characteristics&gt;<p>&quot;Rsync&#x27;s protocol has none of these good characteristics.&quot;<p>...<p>&quot;It unfortunately makes the protocol extremely difficult to document, debug or extend. Each version of the protocol will have subtle differences on the wire that can only be anticipated by knowing the exact protocol version.&quot;<p>This is why it is very hard to implement a client program that can communicate with the standard rsync deamon on a server. You can always use the rsync program itself to communicate with the server, but this is not always an option. If it is - it can get ugly. On windows, you need cygwin or similar to run rsync.exe, which can complicate the deployment of your desktop app or shell extension.<p>An easy rsync client API would be useful if you were building an app that can store files on an rsync server, because the rsync utility and the rsync algorithm are great ways to efficiently syncronize files.
评论 #6506759 未加载
评论 #6505649 未加载
评论 #6505833 未加载
beagle3超过 11 年前
I&#x27;m almost sure that this description is out of date and describes rsync 2.<p>rsync 3 does not need to create or transfer the entire file list - in fact, it will start immediately, and will have no idea how many files are left -- it&#x27;s not uncommon for it to always say &quot;just 1000 more files left&quot; all the time while working through a million files. You can force it to prescan all files with -m (&quot;--prune-empty-dirs&quot; or something like that) if you insist.<p>Also, I might be mistaken, but I think rsync3 doesn&#x27;t even transfer the entire file list to the other side - it will treat the directory like a file (which contains file names, attributes, and checksums), and transfer <i>that</i> using rsync. If nothing changed, this will take a few bytes. If something did, the entire directory listing is rsynced to the other side, and it will be determined recursively which files and directories actually need to be transferred -- with every directory that doesn&#x27;t any changes skipped like a file that doesn&#x27;t need any changes.
Theodores超过 11 年前
The &#x27;rolling checksum&#x27; part of the implementation is brilliant.<p>I have often wondered why it is that rsync is so life-saving-ly quick and how it is that a few small changes to a massive file (e.g. from mysqldump) can be copied up to a server from the slow end of an ADSL line so quickly. Now I know about the &#x27;rolling checksum&#x27; I can see what is going on.<p>Note that I work with people who use &#x27;FTP&#x27; to copy files, or even worse, people who find FTP too complicated and have to send me files on a &#x27;Dropbox&#x27; thing so I can download them and upload them for them, notionally with &#x27;FTP&#x27;. (I will use rsync instead, not least for the bandwidth control options).<p>I have even had micro-managers get me to get FTP to work on the server for them, despite my protestations about it being insecure (which it really is if you use a Windows PC and something like Filezilla).<p>Obviously I only use rsync and scp. Without aforementioned micro-managed requests I would not even know if FTP was installed on the server side.<p>My point is that it may be easy for a few folks here to criticise rsync, however, there are a lot of people, from clients to managers and even talented programmers that just don&#x27;t have a clue about rsync and are stuck in some stone age of using things like FTP.
评论 #6506673 未加载
评论 #6506718 未加载
cliveowen超过 11 年前
I always found myself looking for a simple way to backup a hierarchy of folders on an external device and then keep keep both copies synced, then I heard about rsync and discovered that it does just that. Being using it exclusively for all of my backups, really useful.<p>EDIT: Also since we&#x27;re talking about rsync, do you think the following options are sufficient for syncing a folder hierarchy from the local disk to an external flash drive?<p>rsync -aW --delete &#x2F;source &#x2F;destination<p>My main concern is the W option, which skips the usual compression (that delays a lot the already long process of syncing) and might end up writing a lot of bytes and decaying the memory cells of flash storage.
评论 #6505993 未加载
评论 #6506481 未加载
评论 #6506665 未加载
almost超过 11 年前
Something that wasn&#x27;t clear to me right away is that the generator is running on the remote system (assuming a remote transfer) so in the generator -&gt; sender -&gt; receiver bit each -&gt; is data going over the network.
meltzerj超过 11 年前
What are peoples&#x27; thoughts on using rsync for production deployment?
评论 #6506507 未加载
评论 #6505817 未加载
评论 #6505302 未加载
评论 #6505925 未加载
nicolast超过 11 年前
You might be interested in <a href="http://blog.incubaid.com/2012/02/14/rediscovering-the-rsync-algorithm/" rel="nofollow">http:&#x2F;&#x2F;blog.incubaid.com&#x2F;2012&#x2F;02&#x2F;14&#x2F;rediscovering-the-rsync-...</a>
joeblau超过 11 年前
Thanks! I&#x27;m building an app where I might need to implement this type of syncing paradigm.
评论 #6505060 未加载
ape4超过 11 年前
Rsync always sends a list of files (and their attributes). But typically most files haven&#x27;t changed. They could just send the files that have changed since the last sync.
评论 #6505922 未加载
评论 #6507358 未加载
评论 #6505931 未加载
UrsaFoot超过 11 年前
A typo: s&#x2F;transfe&#x2F;transfer&#x2F;
beedogs超过 11 年前
Too bad there&#x27;s not a complimentary document, &quot;How Rsync Breaks&quot;, because that one would be quite useful as well. I&#x27;ve had it fail in the most annoying and arbitrary ways and it&#x27;s dissuaded me from using it in any real production situations.
评论 #6506749 未加载