Are You Sure You Want to Use MMAP in Your Database Management System? (2022)

192 pointsby nethuntersalmost 2 years ago

15 comments

hyc_symasalmost 2 years ago

This is a pretty old argument and IMO it's far out of date/obsolete.Taking full control of your I/O and buffer management is great if (a) your developers are all smart and experienced enough to be kernel programmers and (b) your DBMS is the only process running on a machine. In practice, (a) is never true, and (b) is no longer true because everyone is running apps inside containers inside shared VMs. In the modern application/server environment, no user level process has accurate information about the total state of the machine, only the kernel (or hypervisor) does and it's an exercise in futility to try to manage paging etc at the user level.As Dr. Michael Stonebraker put it: The Traditional RDBMS Wisdom is (Almost Certainly) All Wrong. <a href="https://slideshot.epfl.ch/play/suri_stonebraker" rel="nofollow noreferrer">https://slideshot.epfl.ch/play/suri_stonebraker</a> (See the slide at 21:25 into the video). Modern DBMSs spend 96% of their time managing buffers and locks, and only 4% doing actual useful work for the caller.Granted, even using mmap you still need to know wtf you're doing. MongoDB's original mmap backing store was a poster child for Doing It Wrong, getting all of the reliability problems and none of the performance benefits. LMDB is an example of doing it right: perfect crash-proof reliability, and perfect linear read scalability across arbitrarily many CPUs with zero-copy reads and no wasted effort, and a hot code path that fits into a CPU's 32KB L1 instruction cache.

评论 #36565504 未加载

评论 #36565886 未加载

评论 #36568460 未加载

评论 #36565060 未加载

评论 #36567333 未加载

评论 #36566978 未加载

评论 #36564527 未加载

jandrewrogersalmost 2 years ago

Another interesting limitation of mmap() is that real-world storage volumes can exceed the virtual address space a CPU can address. A 64-bit CPU may have 64-bit pointers but typically cannot address anywhere close to 64 bits of memory, virtually or physically. A normal buffer pool does not have this limitation. You can get EC2 instances on AWS with more direct-attached storage than addressable virtual address space on the local microarchitecture.

评论 #36567395 未加载

pjdesnoalmost 2 years ago

Not just databases - we ran into the same issues when we needed a high-performance caching HTTP reverse proxy for a research project. We were just going to drop in Varnish, which is mmap-based, but performance sucked and we had to write our own.Note that Varnish dates to 2006, in the days of hard disk drives, SCSI, and 2-core server CPUs. Mmap might well have been as good or even better than I/O back then - a lot of the issues discussed in this paper (TLB shootdown overhead, single flush thread) get much worse as the core count increases.

评论 #36565939 未加载

评论 #36566465 未加载

wood_spiritalmost 2 years ago

Old timers will recall when using mmap was a prominently promoted selling point for the “no sql” dbms.

评论 #36563958 未加载

评论 #36563723 未加载

dangalmost 2 years ago

Related:Are You Sure You Want to Use MMAP in Your Database Management System? [pdf] - <a href="https://news.ycombinator.com/item?id=31504052">https://news.ycombinator.com/item?id=31504052</a> - May 2022 (43 comments)Are you sure you want to use MMAP in your database management system? [pdf] - <a href="https://news.ycombinator.com/item?id=29936104">https://news.ycombinator.com/item?id=29936104</a> - Jan 2022 (127 comments)

dist1llalmost 2 years ago

Many general-purpose OS abstractions start leaking when you're working on systems-like software.You notice it when web servers are doing kernel bypass to for zero-copy, low-latency networking, or database engines throw away the kernel's page cache to implement their own file buffer.

评论 #36563977 未加载

评论 #36563874 未加载

kwohlfahrtalmost 2 years ago

It sounds like a lot of the performance issues are TLB-related. Am I right in thinking huge-pages would help here? If so, it's a bit unfortunate they didn't test this in the paper.Edit: Hm, it might not be possible to mmap files with huge-pages. This LWN article[1] from 5 years ago talks about the work that would be required, but I haven't seen any follow-ups.[1]: <a href="https://lwn.net/Articles/718102/" rel="nofollow noreferrer">https://lwn.net/Articles/718102/</a>

评论 #36567607 未加载

评论 #36567278 未加载

Dweditalmost 2 years ago

Memory-Mapped Files = access violations when a disk read fails. If you're not prepared to handle those, don't use memory-mapped files. (Access violation exceptions are the same thing that happens when you attempt to read a null pointer)Then there's the part with writes being delayed. Be prepared to deal with blocks not necessarily updating to disk in the order they were written to, and 10 seconds after the fact. This can make power failures cause inconsistencies.

评论 #36563654 未加载

评论 #36563898 未加载

评论 #36565028 未加载

评论 #36563695 未加载

mpweiheralmost 2 years ago

Yes, I definitely would want to use mmap() in my storage system. And would love to see the limitations that make this tricky addressed.

zffralmost 2 years ago

The TLDR is that MMAP sorta does what you want, but DBMSes need more control over how/when data is paged in/out of memory. Without this extra control, there can be issues with transactional safety, and performance.

benlivengoodalmost 2 years ago

For all of its usefulness in the good old days of rusty disks I wonder if virtual memory is worth having for dedicated databases, caches, and storage heads. Avoiding TLB flushes entirely sounds like a huge win for massively multithreaded software and memory management in a large shared flat address space doesn't sound impossibly hard.

评论 #36568826 未加载

评论 #36567640 未加载

jasonhanselalmost 2 years ago

I've become convinced that there are very few, if any, reasons to MMAP a file on disk. It seems to simplify things in the common case, but in the end it adds a massive amount of unnecessary complexity.

评论 #36563846 未加载

评论 #36563741 未加载

评论 #36565559 未加载

评论 #36567213 未加载

评论 #36563964 未加载

AnotherGoodNamealmost 2 years ago

A well written bespoke function can beat a generalized function at a specific task.If you have the resources to write and maintain the bespoke method great. The large database developers probably have this. For others please don't take this link and go around claiming mmap is bad though. That gets tiresome and is misguided. Mmap is a shortcut to access large files in a non linear fashion. It's good at that too. Just not as good as a bespoke function.

评论 #36563766 未加载

评论 #36563738 未加载

SoftTalkeralmost 2 years ago

This reads more like "don't write your own DBMS" than "don't use mmap."

jFriedensreichalmost 2 years ago

maybe a stupid question but what is wrong with coffee and spicy food?

评论 #36563651 未加载

评论 #36567122 未加载

评论 #36564536 未加载