TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Crash Consistency: Rethinking the Fundamental Abstractions of the File System

36 pointsby BruceMalmost 10 years ago

3 comments

Animatsalmost 10 years ago
The UNIX file system abstraction is very simple, and doesn&#x27;t define post-crash states. I once proposed different guarantees for different types of files:<p>- Unit files - the unit of update is the file. Files are created, written, and closed, and are not visible to other processes until closed. Once closed, the file is read-only; it cannot be rewritten, only replaced as a unit. For POSIX-type systems, files created with &#x27;creat&#x27; should be created in this mode. O_TRUNC should be interpreted as &quot;replace the old file with the new version on close&quot;. If the program aborts before a proper close, the new file should be dropped, leaving the old version intact.<p>The crash guarantee should be that post-crash, you have a completely written file. It can be either the old version or the new version, but never a partial version. This eliminates the gyrations people go through to get this behavior.<p>- Log files - the unit of update is the write, which must be at the end. These are files opened for append. Appending is always at the end of the file, even from multiple processes. &quot;seek&quot; is disallowed if the file is open for writing; you can only append.<p>The crash guarantee should be that post-crash, you have a file which is either complete to the last write, or truncated precisely after some write. The file may not be cut in the middle of a record or trail off into junk.<p>- Temporary files - after a crash, they&#x27;re gone.<p>- Managed files - these are for databases, and support additional functions related to locking and file synchronization. That&#x27;s what the article is about. For the other types of files, you don&#x27;t need all those features.<p>In practice, most files are unit files, log files, or temporary files. The number of programs which use managed files is small; mostly they&#x27;re database program or libraries.<p>Programs which use managed files and need data soundness after a crash must be very aware of concurrency and safety semantics. A somewhat different API may be required. There should be two notifications from a write - &quot;data accepted&quot; and &quot;data safely committed&quot;. Callers should be able to make blocking writes based on either of those, or make non-blocking writes and get two callbacks. This puts the concurrency management in the database application, which knows what data depends on other data. The file system can&#x27;t know that, and shouldn&#x27;t try to.
评论 #9855869 未加载
PhantomGremlinalmost 10 years ago
The discussion and complaints are mostly about Linux filesystems.<p>It would be interesting to know how well BSD FFS does and how well ZFS does. Not a whole lot said about Windows either (my anecdotal experiences with NTFS have been pretty good).<p>The article does touch on one very serious problem faced by all filesystems, and that is the underlying hardware often lies to the OS. That problem will probably get worse before it gets better. Specifically, flash devices have very complex firmware that is often buggy. Also flash devices do so much more to data (e.g. moving data around for wear leveling).
nickpsecurityalmost 10 years ago
&quot;As previously explained, however, in-order updates (i.e., better crash behavior) are not practical in multitasking environments with multiple applications.&quot;<p>They say this is the obstacle to the <i>best</i> approach using the same interface. I say look at our best results in DB, concurrency, and distributed computing research to see if there&#x27;s a solution that might work given a certain number of cores or applications. There&#x27;s also the possibility of processor enhancements that put this sort of thing in the I&#x2F;O architecture.