It seems to me that, in fact, your original idea was, in fact, the correct one - rsync probably would have been the best way to do this (and separately, a truck full of disks probably would have been the other best way).<p>First, rsync took too long probably because you used just one thread and didn't optimize your command-line options - most of the performance problems with rsync with large filesystem trees comes from using one command to run everything, something like:<p>rsync -av /source/giant/tree /dest/giant/tree<p>And the process of crawling, checksumming, storing is not only generally slow, but incredibly inefficient on today's modern multicore processors.<p>Much better to break it up into many threads, something like:<p>rsync -av /source/giant/tree/subdir1 /dest/giant/tree/subdir1<p>rsync -av /source/giant/tree/subdir2 /dest/giant/tree/subdir2<p>rsync -av /source/giant/tree/subdir3 /dest/giant/tree/subdir3<p>That alone probably would have dramatically sped things up, BUT you do still have your speed of light issues.<p>This is where Amazon import/export comes in - do a one-time tar/rsync of your data to an external 9TB array, ship it to Amazon, have them import it to S3, load it onto your local Amazon machines.<p>You now have two copies of your data - one on s3, and one on your amazon machine.<p>Then you use your optimized rsync to run and bring it up to a relatively consistent state - i.e. it runs for 8 hours to sync up, now you're 8 hours behind.<p>Then you take a brief downtime and run the optimized rsync one more time, and now you have two fully consistent filesystems.<p>No need for drbd and all the rest of this - just rsync and an external array.<p>I've used this method to duplicate terabytes and terabytes of data around, and 10s of millions of small files. It works, and is a lot fewer moving parts than drbd