Can I add another gotcha - if you have a failed device in zfs and you can't import the pool anymore, don't think you can walk the drive and put the blocks back together manually.<p>You can do this in FAT and mostly in NTFS and UFS - especially if your drive is defragmented.<p>But the complicated data structures of ZFS makes this really hard if not close to impossible - there are tools and ways of using ZFS code to help you do this, but they are exceedingly hard to use.<p>Source: took me 52 hours to retrieve a 16k file from a failed device. Granted it would take me less time now, but I now think of devices the have failed as if they has completely disappeared from our universe.
copies=n was not intended to be used in place of mirroring - it was really intended for systems where mirroring wasn't an option. While the initial proposal didn't call it out explicitly, those that were "in the know" on the thread called this out as a feature to help with failures on laptops. For instance, this message on Sept 12, 2006:<p><pre><code> Dick Davies wrote:
> The only real use I'd see would be for redundant copies
> on a single disk, but then why wouldn't I just add a disk?
Some systems have physical space for only a single drive - think most
laptops!
--
Darren J Moffat
</code></pre>
It seems this thread has disappeared from the internet. If anyone is interested in zfs-discuss@opensolaris.org archives, I can probably convince gmail to turn it into an mbox and post it somewhere.<p>Edit: format. Sorry mobile users, I really need a block quote here.
On rotating media, I assume copies=n might improve performance... By multiple copies of a piece of data existing, whichever piece of data is closest to the read head can be read. Rotational delays on a 7200 RPM risk are about 10 milliseconds, so if you can halve that by picking a closer copy of the data, you'll get your data back quicker!<p>All this depends on the filesystem issuing reads for <i>all</i> the copies, the drive firmware correctly deciding which to read first, and then the OS being able to cancel the reads of the other copies.<p>I kinda doubt all the above logic is implemented and bug free... Which is sad :-(
This was always something that surprised me about ZFS. For a fancy filesystem it largely copied RAID with a trivial layout. I always thought that it would be better to treat the devices as pools of storage and "schedule" writes across them in a much more flexible way. Instead of saying that "these two drives are mirrored" you should be able to say "I want to write two copies of this data" and for example it could pick the two idle drives, or the two emptiest drives. Same with striping, parity and other RAID options.<p>It seems like the only real advantage "ZFS RAID" has over "RAID + ZFS" is that it can handle the actual write requests separately and it has better options for reading when copies are damaged. But it seems like the layout is just as inflexible as a dumb RAID so we aren't gaining as much as we could by combining the two together.<p>(My knowledge may be out of date)<p>copies=n is obviously a step in the right direction but as mentioned it doesn't really provide enough to solve the problem.<p>It seems to me that the only real downside is that you need to store each location of a block, instead of storing one and assuming that the other locations are the same on the "matching" disks.
First of all, this needs a (2016) tag.<p>Would an exact duplicate of the existing working drive, maybe done with dd, help with this? Maybe some Metadata from the drive layout would have to be changed, too.<p>Other than this workaround, it seems that ZFS could be changed to allow an import again. Has this been changed in recent years?
There was a post recently about building a home fileserver that mentioned a file system that did this - it was just a layer on top of existing disks and file systems and sorted files by directory (and could send a file to multiple drives if so desired).<p>I can’t find it now but it was an interesting website.
This seems such a trivial observation that I fail to see the significance.<p>This option is there for on-device recovery, i.e. resistance to bitrot.<p>A new option involving forward error coding would be even better though. In-filesystem PAR/CRC anyone?