TechEcho

6 comments

masklinnover 6 years ago

If you want an overview of the issue, here's a presentation from Tomas Vondra at FOSDEM 2019: <a href="https://youtu.be/1VWIGBQLtxo" rel="nofollow">https://youtu.be/1VWIGBQLtxo</a>Or an early recap of the "fsyncgate" issue in textual form: <a href="https://lwn.net/Articles/752063/" rel="nofollow">https://lwn.net/Articles/752063/</a>Related (also listed by Tomas Vondra): Linux's IO errors reporting <a href="https://youtu.be/74c19hwY2oE" rel="nofollow">https://youtu.be/74c19hwY2oE</a>A previous hn discussion on the subject: <a href="https://news.ycombinator.com/item?id=19119991" rel="nofollow">https://news.ycombinator.com/item?id=19119991</a>Also note that this is a broad issue with fsync, it's possible that your own software is affected: <a href="https://wiki.postgresql.org/wiki/Fsync_Errors" rel="nofollow">https://wiki.postgresql.org/wiki/Fsync_Errors</a> links to mysql and mongodb fixes for the same assumptions, one of the posts from the original fsyncgate thread mentions that dpkg made the same incorrect assumption.

评论 #19242383 未加载

mjw1007over 6 years ago

The Linux project takes the view « We don't attempt to rigorously document our API; instead we promise that if your program worked yesterday it will continue to work in the future. »I think this story shows a weakness in that approach: for rarely-exercised error handling paths, it's too likely that your program didn't work yesterday and you had no easy way to know that.(This is a separate issue from the fact that until recently the kernel implementation of fync itself had significant bugs, measured against what its maintainers thought ought to be guaranteed.)

评论 #19238795 未加载

评论 #19239027 未加载

lelfover 6 years ago

> Starting from kernel 4.13, we can now reliably detect such errors during fsync.No. Not even close.See <a href="https://wiki.postgresql.org/wiki/Fsync_Errors" rel="nofollow">https://wiki.postgresql.org/wiki/Fsync_Errors</a>

评论 #19238805 未加载

mehrdadnover 6 years ago

Does anyone know if FlushFileBuffers() on Windows also forgets to flush data that previously failed to flush? i.e., if Windows has the same issue or not?

评论 #19240544 未加载

评论 #19240534 未加载

bepvteover 6 years ago

<a href="https://edfile.pro/380a8f2" rel="nofollow">https://edfile.pro/380a8f2</a>Really enjoyable reading experience...

doogliusover 6 years ago

> To understand it better, consider an example of Linux trying to write dirty pages from page cache to a USB stick that was removed during an fsync. Neither the ext4 file system nor the btrfs nor an xfs tries to retry the failed writes. A silently failing fsync may result in data loss, block corruption, table or index out of sync, foreign key or other data integrity issues… and deleted records may reappear.As opposed to what? If the drive isn't there anymore, there's not a whole lot that can be done.> With the new minor version for all supported PostgreSQL versions, a PANIC is triggered upon such error. This performs a database crash and initiates recovery from the last CHECKPOINT.How is a recovery possible if the hard drive is borked? I don't understand the model that leads to this "fix" making any difference.

评论 #19238763 未加载

评论 #19238630 未加载

评论 #19239831 未加载

评论 #19238627 未加载

6 comments

masklinnover 6 years ago

评论 #19242383 未加载

mjw1007over 6 years ago

评论 #19238795 未加载

评论 #19239027 未加载

lelfover 6 years ago

评论 #19238805 未加载

mehrdadnover 6 years ago

Does anyone know if FlushFileBuffers() on Windows also forgets to flush data that previously failed to flush? i.e., if Windows has the same issue or not?

评论 #19240544 未加载

评论 #19240534 未加载

bepvteover 6 years ago

<a href="https://edfile.pro/380a8f2" rel="nofollow">https://edfile.pro/380a8f2</a>Really enjoyable reading experience...

Linux Fsync Issue for Buffered IO and Its Preliminary Fix for PostgreSQL

6 comments

Linux Fsync Issue for Buffered IO and Its Preliminary Fix for PostgreSQL

6 comments