No, ZFS really doesn't need an fsck tool

112 pointsby antoniosabout 12 years ago

16 comments

deeloweabout 12 years ago

The whole fsck discussion seems baffling to me.While I'm no ZFS expert, I've been using it for several years now and my understanding is this. Take what a normal fsck type tool does and build those features into the underlying FS and supporting toolchain. For what ZFS does and how it works, it really doesn't make sense to me at all for it to have an "fsck," whatever that means. Really, it's hard to even imagine what an "fsck" would do for zfs. You'd just end up rewriting bits of the toolchain or asking for the impossible.I asked this in the other thread, but I'll ask here again. Excluding semantics, what is it that people want fsck to do specifically that zfs doesn't provide a method for already? Seriously, the question to me seems akin to asking why manufacturers don't publish the rpm spec for SSDs. It's a really odd thing to ask and can't be answered without an exhaustive review of the mechanics of the system.I can't help but get the feeling that a lot of people complaining about ZFS have very little knowledge or familiarity with it and/or BSD/Unix in general. ZFS is not like any Linux FS. It doesn't use fstab, the toolchain is totally different, the FS is fundamentally different. It was built for Solaris and really reflects their ideology, which is completely foreign to people who only have familiarity with Linux. Accept it and move on or don't, but I've yet to see any evidence to back up these claims other than "this is what is done in Linux for everything else" which is just FUD.

评论 #5461654 未加载

评论 #5461649 未加载

评论 #5464429 未加载

ChuckMcMabout 12 years ago

Interesting rant (from 2009). At NetApp the WAFL file system is also always consistent on disk, so it too doesn't need fsck. That said, WAFL had 'wack' (WAfl ChecK) which could go through and check that the on disk image was correct.Unlike UFS or FFS or EXTn the file system couldn't be corrupted by loss of power mid write, but like ZFS it can be corrupted by bugs in the code which write a corrupted version to disk. So the tool does something similar to fsck but it is simpler, more of a data structure check rather than a "recreate the flow of buffers through the buffer cache to save as much as possible" exercise.

评论 #5461698 未加载

评论 #5461319 未加载

评论 #5461364 未加载

c0t0d0s0about 12 years ago

1. When there is a bug in the code that writes the ZFS stuff, why should the bug be addressed by the fsck code. This would assume, that you know of the bug beforehand, but then you could better fix the bug in the code that writes.2. When there is a bug in the on-disk-state it should be addressed by the code that reads the data , not by a fsck tool.2.1. The correction of the bug in the on-disk-state should be done on the basis of the exact knowledge about the bug and not by a generic check tool.3. Repair is always based on assumptions. Those could be correct or incorrect. The more you know about the problem that led to the repair-worthy state, the more probable the assumptions are correct.4. What is the reasoning behind the argument "when your metadata is corrupt , that the data is correct" and so you could repair metadata corruption without problems. It sounds more sensible to fall back to the last known correct and consistent state of metadata and data, based on the on-disk-state represented by the pointer structure of the ueberblock with the highest transaction group commit number with a correct checksum . The Transaction Group rollback at mount does exactly this.

ScottBursonabout 12 years ago

I lost a ZFS pool once. The cause ultimately turned out to be a slowly failing PSU. (It was an expensive OCZ PSU, too, which is why I didn't suspect it as quickly as I probably should have. OCZ did replace it under warranty without argument.)It was a development machine, so it wasn't being backed up. I thought it was just one disk going bad; by the time it was clear that it was something worse than that, it was too late. Most of the important contents of the pool had been checked into the VCS, but not everything. I wound up grepping the raw disk devices to find the latest versions of a couple of files.Any filesystem would have had serious trouble in such a situation, of course. But I can't help thinking that picking up the pieces might have been easier with, say, EXT3.On the other hand, I think it speaks well for ZFS that a slowly failing PSU seems to be almost the only way to lose a pool.

ianlevesqueabout 12 years ago

So if you have an unmountable zfs pool, instead of reaching for fsck (which doesn't and won't exist) you can instead do:<pre><code> zpool clear -F data </code></pre> And it will take advantage of the copy-on-write nature of ZFS to roll back to the last verifiably consistent state of your filesystem. That's a lot better than the uncertain fixes applied by fsck to other file systems. It even tells you how old the restored state is.

评论 #5462264 未加载

nwfabout 12 years ago

A slight disagreement: the advantage of a ZFS online consistency checker would be to help ensure that there are no bugs in ZFS.It appears that ZFS lacks a full consistency checker -- scrub only walks the tree and computes checksums; notably absent in this procedure appears to be validating the DDT. While ZFS claims to be always on-disk consistent--and I certainly believe that the intent is that it be so!--I seem to have tripped over some bug ( <a href="http://lists.freebsd.org/pipermail/freebsd-fs/2013-March/016627.html" rel="nofollow">http://lists.freebsd.org/pipermail/freebsd-fs/2013-March/016...</a> ) which corrupted the DDT, and now I have no way of rebuilding it, so I dropped $$$ (for me) on a new disk array and zfs send | zfs recv so that everything rebuilt. That's sort of crazy, if I may be so bold.I suppose I could take the pool offline for several days and poke at it with zdb, but that is not really desirable either.

joostersabout 12 years ago

The article doesn't ever consider that ZFS might have bugs. Dodgy disks, bad firmware, power failures, yes. But no consideration that the ZFS code could contain problems.If you are happy that the ZFS code is perfect, then it makes sense to rely upon its consistency checks, snapshot features, etc (and I'm not criticising those). But what if ZFS isn't 100%? How do you recover your data?

评论 #5461230 未加载

评论 #5461227 未加载

评论 #5461253 未加载

评论 #5461372 未加载

abtinfabout 12 years ago

ZFS may not need fsck, but it would be great if Oracle would re-open-source it. I've considered using it, but I can't trust that it has a future.I'm also rather confused by Oracle contributing to btrfs while also building ZFS privately. My intuition is that if they open-sourced ZFS and offered it under a dual BSD/GPL license, it would become the fs standard overnight.

评论 #5461304 未加载

评论 #5461332 未加载

评论 #5461800 未加载

jodrellblankabout 12 years ago

And after a dozen paragraphs on how ZFS is unlikely to get corrupted, the meat of the conent: "my opinion is that you shouldn't try to repair it anyway".Anyway: You do not repair the state last state of the data. And in my opinion: You should not try to repair it ... at least not by automatic means. Such a repair would be risky in any case. [..] In this situation i would just take my money from the table and call it a day. You may lose the last few changes, but your tapes are older.This "you do not need an emergency repair tool because in an emergency I think you should just forget it" is exactly the claim that this blog post was supposed to be countering. Explaining why a do-the-best-you-can repair utility is not necessary, and the argument it boils down to is "because I don't think you should do that".

评论 #5464670 未加载

jiggy2011about 12 years ago

This reminds me of when I introduce people to Linux and they insist that there must really be a C drive.

blinkingledabout 12 years ago

Been running it since 0.6.0-rc14 on a Proliant Microserver with ECC RAM and I am happy with it. 4x2TB RAIDZ internal, and 2x1TB USB3 zpools with SSD for zil and logs. Shared over GigE using Samba4 and AFP.Performance is decent enough with lz4 compression and dedup off. Dedup on takes more CPU but nothing even the 2.2Ghz Turion can't handle. Main thing is stability has improved a lot too.If you want the utmost performance may be this isn't for you but for NAS/backup/streaming type usage ZFS on Linux is nearly perfect.

spikelsabout 12 years ago

Maybe I'm just tired this morning but I wish this article could just get to the point. I feel like I'm reading a mystery novel but I am never going to make it to the end and find out who did it.

评论 #5461138 未加载

评论 #5461441 未加载

评论 #5464676 未加载

DiabloD3about 12 years ago

I don't understand why people think ZFS doens't have a fsck tool: zpool scrub

评论 #5466575 未加载

brianlouiswabout 12 years ago

love the domain name

derlethabout 12 years ago

It reminds me of how people used to think all filesystems needed to be explicitly defragmented because of design flaws in FAT, which was designed for floppies (and wasn't especially well-designed even at that).<a href="http://geekblog.oneandoneis2.org/index.php/2006/08/17/why_doesn_t_linux_need_defragmenting" rel="nofollow">http://geekblog.oneandoneis2.org/index.php/2006/08/17/why_do...</a><a href="http://www.howtogeek.com/115229/htg-explains-why-linux-doesnt-need-defragmenting/" rel="nofollow">http://www.howtogeek.com/115229/htg-explains-why-linux-doesn...</a>

PaulHouleabout 12 years ago

Nope, it just needs to stop having wrecks.