Inspired by many of the amazing new-wave Unix tools that have come about over the past several years (fzf, ripgrep, fd, lsd, etc.), I set out to create a more modern and performant version of the classic Unix cp command. Written in Rust due to the excellent ergonomics and performance characteristics of the language, fcp copies large files and directories in a fraction of the time that cp does.<p>Feel free to ask me any questions about the project!
For SSDs this is probably the right thing (maybe use uring tho?)<p>For hard drives this is the opposite of what you want. For hard drives, especially when copying large trees, you only want one thread to access the disk, and preferably sort its stat()s and open/read/close() by inode.<p>Also this tool seems to lack most options that cp(1) has.
It looks like this is using multiple threads to copy files. That's fine for NVMe which tends to achieve more throughput with higher queue depths. But it will degrade performance when copying large files on spinning rust.
The only thing I'd like "new" in cp is an optional progress indicator, for large files<p>Btw, copying files is mostly bound by disk speed (or network if share), not so much by CPU, so it doesn't seem to be a valid reason to rewrite it in Rust for cpu performance...
“rewrite (the easy parts of) $battle-tested-ancient-coreutil in rust” seems interesting as an exercise or as a way to learn a new language, but i’m not convinced of it’s broader utility.
Why do people think it is acceptable for utility programs like file copies to launch many threads and saturate my CPU? It's usually the case that there are many things being done on the system, and I don't want one task take up all the resources I have. Most classic Unix command line tools only use a single thread, so I know they won't bog down the entire system without any effort on my part (adjusting niceness, cgroups, ulimit, etc).
I just tried this going between my 2 servers with NFS, as compared to cp, rsync, and rclone. This tool wins by >2x. Great work! ~1.5Gbps vs ~3-4Gbps.
There was a tool for Solaris called "mtwrite" that would use LD_PRELOAD to intercept writes and farm them out to threads. This way, it would work with not just cp, but also tar extracts to files, etc.<p><a href="http://www.maier-komor.de/mtwrite.html" rel="nofollow">http://www.maier-komor.de/mtwrite.html</a>
> <i>The massive difference in performance in this case is due to fcp using fclonefileat and fcopyfile under the hood</i><p>GNU Coreutils' cp, the thing you use on most GNU/Linux systems, is not the "Classic Unix cp".<p>It has options for copy-on-write cloning via kernel-specific methods.<p>COW copying is opt-in probably because it's not a true copy. If either file sustains damage, both are toast.
<a href="https://www.unix.com/man-page/mojave/2/fclonefileat/" rel="nofollow">https://www.unix.com/man-page/mojave/2/fclonefileat/</a><p>summary: this is a nice cow/cloning wrapper for use on a single ssd partition!<p>-- but cloning not copying between volumes or partitions is .. and when you actually want to copy with fcp, it can be very slow as detailed in other comments.
I worked with a large cluster filesystem and it had some interesting properties. In particular, files were effectively append-only and immutable when closed, but the FS supported "snapshot for append" which allowed to to make a snapshot of the file, and append to it. Under the hood it managed all the pointers but when the FS designers found out we used the feature heavily they got worried.
I'm afraid the only way there can be a faster file copying tool is by sacrificing contiguousness and increasing fragmentation. Isn't this the case?<p>I want fragmentation to be as little as possible so it wouldn't be too hard to manually recover a file if I delete something accidentally or damage the file system.
Unfortunately I find it's not enough to copy large files, you need to verify them as well. I don't trust the hard drive controller or the nvme controller to have not f'ed something up.