As cool as the rsync algorithm is, i'd much rather we had the dsync utility outlined in this usenix 08 paper.
<a href="http://www.usenix.org/event/usenix08/tech/full_papers/pucha/pucha.pdf" rel="nofollow">http://www.usenix.org/event/usenix08/tech/full_papers/pucha/...</a><p>An adaptive protocol that matches to the systems load dynamically whether its cpu/disk/network. Anyone know of what happened to this?
There is a Better Way. Instead of using fixed sized blocks, use variable sized blocks. Decide the block boundaries using the data in the blocks themselves. This will reduce your search from O(n^2) to O(n).<p>Tarsnap does this. My project (ddar) does the same.
The rsync algorithm and program are both great, and I use the program a lot to update directory trees across the network. It's also my default tool for synchronizing two directories on the same system. The rsync program correctly optimizes for this case by skipping the rsync algorithm and completely copying changed files. However, it still uses multiple processes and seemingly still calculates some hashes, making it slower than it needs to be.<p>Joey found [0] that running rsync once in dry-run mode to find what files have been changed, copying them each with cp, then running rsync a second time to handle things like deletions and file permissions resulted in a major speedup.<p>[0] <a href="http://kitenet.net/~joey/blog/entry/local_rsync_accelerator/" rel="nofollow">http://kitenet.net/~joey/blog/entry/local_rsync_accelerator/</a>
<i>Don’t walk the folder and ‘rsync’ each file you encounter</i><p>If I just tell rsync to syncronise between two directories, what does it do internally? I might have assumed that it does the more naive option, but in practice it seems to do a lot of upfront calculation that suggests it's doing something more sophisticated.
Sidenote but in case it's helpful to someone; if you need to have rsync.exe on Windows, here's one path:<p><a href="https://github.com/thbar/rsync-windows" rel="nofollow">https://github.com/thbar/rsync-windows</a>
Do you know of any other implementations of the rsync algorithm other than the actual rsync program? And where are they used?<p>Do you know how and where dropbox uses rsync?<p>There have been some tries to port the rsync program to other languages/platforms [1], but they are usually not in sync with the current rsync program. I am talking about ports of the program, not new implementations of the algorithm.<p>[1] <a href="https://github.com/MatthewSteeples/rsync.net" rel="nofollow">https://github.com/MatthewSteeples/rsync.net</a>
rsync is great. I use the "-H" and "--link-dest" options to make incremental backups which look like snapshots. Been doing this for the better part of a decade; would be interested to know if there's A Better Solution(tm) out there...
If someone committed any of that code to a repository I was working on, then I'd hang them up. It's 2011 and people are still using one and two letter variable names.<p>An interesting article, but I don't have time nor the inclination to understand the code, which is the core of it.