TechEcho

7 comments

Also known as:Write! No, fsync! No, really fsync I mean it!Wait, why is my disk throughput so low? And why am I out of file descriptors?

评论 #43528071 未加载

评论 #43530899 未加载

baruchabout 2 months ago

It's easier to write the system's front end while paying little attention to the backend and "just" letting a local filesystem do a lot of the work for you, but it doesn't work well. The interesting question is if the result is also that the frontend-to-backend communication abstraction is good enough to replace the backend with a better solution. I'm not familiar enough with Ceph and BlueStore to have a conclusion on that.I happen to work for a distributed file-system company, and while I don't do the filesystem part itself, the old saying "it takes software 10 years to mature" is so true in this domain.

Spivakabout 2 months ago

It really is true, I spent years of my life wrangling a massive glusterfs cluster and it was awful. You basically can't do any kind of file system operations on it that aren't CRUD on well known specific paths. Anything else— traversal, moving/copying, linking, updating permissions would just hang forever. You're also at the mercy of the kernel driver which does hate you personally. You will have nightmares about uninterruptible sleep. Migrating it all to S3 over Ceph was a beautiful thing.

评论 #43530638 未加载

sitkackabout 2 months ago

See also "Hierarchical File Systems are Dead" by Margo Seltzer and Nicholas Murphy <a href="https://www.usenix.org/legacy/events/hotos09/tech/full_papers/seltzer/seltzer.pdf" rel="nofollow">https://www.usenix.org/legacy/events/hotos09/tech/full_paper...</a>

评论 #43528099 未加载

zokierabout 2 months ago

Lot's of these issues seem to be not specific to distributed systems and also impact local single-node systems. Notable example is postgresql fsyncgate, or how mail servers in the past struggled (iirc that was one of the cases where reiserfs shined).

resurrectedabout 2 months ago

Noooo, really?It all depends on what you want to do. For things that are already in files like all that data that DeepSeek and other models train on and for which DS open sourced their own distributed file system, it makes sense to go with a distributed file system.For OLTP you need a database with appropriate isolation levels.I know someone will build a distributed file system on top of FoundationDB if they haven’t yet.

评论 #43528274 未加载

评论 #43526944 未加载

评论 #43532187 未加载

评论 #43529749 未加载

评论 #43526998 未加载

acidmathabout 2 months ago

Before Bluestore, we ran Ceph on ZFS with the ZFS Intent Log on NVDIMM (basically non-volatile RAM backed by a battery). The performance was extremely good. Today, we run Bluestore on ZVOLs on the same setup and if the zpool is a "hybrid" pool we put the Ceph OSD databases on an all-NVMe zpool. Ceph WAL wants a disk slice for each OSD, so we don't do Ceph WAL and consolidate incoming writes on the ZiL/SLOG on NVDIMM.

评论 #43531272 未加载

7 comments

shermantanktopabout 2 months ago

Also known as:Write! No, fsync! No, really fsync I mean it!Wait, why is my disk throughput so low? And why am I out of file descriptors?

File Systems Unfit as Distributed Storage Back Ends (2019)

7 comments

File Systems Unfit as Distributed Storage Back Ends (2019)

7 comments