If you want an overview of the issue, here's a presentation from Tomas Vondra at FOSDEM 2019: <a href="https://youtu.be/1VWIGBQLtxo" rel="nofollow">https://youtu.be/1VWIGBQLtxo</a><p>Or an early recap of the "fsyncgate" issue in textual form: <a href="https://lwn.net/Articles/752063/" rel="nofollow">https://lwn.net/Articles/752063/</a><p>Related (also listed by Tomas Vondra): Linux's IO errors reporting <a href="https://youtu.be/74c19hwY2oE" rel="nofollow">https://youtu.be/74c19hwY2oE</a><p>A previous hn discussion on the subject: <a href="https://news.ycombinator.com/item?id=19119991" rel="nofollow">https://news.ycombinator.com/item?id=19119991</a><p>Also note that this is a broad issue with fsync, it's possible that your own software is affected: <a href="https://wiki.postgresql.org/wiki/Fsync_Errors" rel="nofollow">https://wiki.postgresql.org/wiki/Fsync_Errors</a> links to mysql and mongodb fixes for the same assumptions, one of the posts from the original fsyncgate thread mentions that dpkg made the same incorrect assumption.
The Linux project takes the view « We don't attempt to rigorously document our API; instead we promise that if your program worked yesterday it will continue to work in the future. »<p>I think this story shows a weakness in that approach: for rarely-exercised error handling paths, it's too likely that your program didn't work yesterday and you had no easy way to know that.<p>(This is a separate issue from the fact that until recently the kernel implementation of fync itself had significant bugs, measured against what its maintainers thought ought to be guaranteed.)
> Starting from kernel 4.13, we can now reliably detect such errors during fsync.<p>No. Not even close.<p>See <a href="https://wiki.postgresql.org/wiki/Fsync_Errors" rel="nofollow">https://wiki.postgresql.org/wiki/Fsync_Errors</a>
Does anyone know if FlushFileBuffers() on Windows also forgets to flush data that previously failed to flush? i.e., if Windows has the same issue or not?
> To understand it better, consider an example of Linux trying to write dirty pages from page cache to a USB stick that was removed during an fsync. Neither the ext4 file system nor the btrfs nor an xfs tries to retry the failed writes. A silently failing fsync may result in data loss, block corruption, table or index out of sync, foreign key or other data integrity issues… and deleted records may reappear.<p>As opposed to what? If the drive isn't there anymore, there's not a whole lot that can be done.<p>> With the new minor version for all supported PostgreSQL versions, a PANIC is triggered upon such error. This performs a database crash and initiates recovery from the last CHECKPOINT.<p>How is a recovery possible if the hard drive is borked? I don't understand the model that leads to this "fix" making any difference.