TechEcho

10 comments

the8472over 7 years ago

The diagram is a bit misleading. There is no system call (switching to the kernel) with read or write hits, assuming the memory is already mapped, has a page table entry and is resident. You only incur switches when performing the mapping and due to various page fault modes (lazy mapping, non-resident data, copy-on-write needed).> This usually happens when the ratio of storage size to RAM size is significantly higher than 1:1. Every page that is brought into cache causes another page to be evicted.While true one can optimize these things by unmapping larger ranges in bulk (to reduce cache pressure) or prefetching them (to reduce blocking) with madvise, allowing the kernel doing the loading asynchronously while you're accessing the previously prefetched ones.If you know your read and write patterns very well you can effectively use this for nearly asynchronous IO without the pains of AIO and few to none extra threads.

评论 #15412534 未加载

评论 #15412037 未加载

评论 #15413120 未加载

评论 #15413274 未加载

评论 #15412629 未加载

评论 #15412504 未加载

rdtscover 7 years ago

Very nice job, I like the diagrams and the description.Noticed this bit "block size which is typically 512 or 4096 bytes" and was wondering how would the application know how to align. Does it query the file descriptor for block size? Is there an ioctl call for that?When it comes to IO it's also possible to differentiate between blocking/non-blocking and synchronous/asynchronous, and those categories are orthogonal in general.So there is blocking and synchronous: read, write, readv, writev. The calling thread blocks until data is ready and while it gets copied to user memory.non-blocking synchronous: using non-blocking file descriptors with select/poll/epoll/kqueue. Checking when data is ready is done asynchronously but then read, write still happens inline and the thread waits for it to be copied to user space. This works for socket IO in Linux but not for disk.non-blocking asynchronous: using AIO on Linux. Here both waiting till data is ready to be transferred and transferring happens asynchronously. aio_read returns right way before read finished. Then have to use aio_error to check for its status. This works for disk but not socket IO on Linux.blocking asynchronous: nothing here

评论 #15414831 未加载

评论 #15411665 未加载

评论 #15412019 未加载

hoytechover 7 years ago

> The great advantage of letting the kernel control caching is that great effort has been invested by the kernel developers over many decades into tuning the algorithms used by the cache.Some other advantages:The kernel has a global view of what is going on with all the different applications running on the system, whereas your application only knows about itself.The cache can be shared amongst different applications.You can restart applications and the cache will stay warm.

评论 #15445182 未加载

评论 #15413781 未加载

评论 #15415636 未加载

drudru11over 7 years ago

Nice article. I agree with the strategy. No matter how clever the OS is in general, it cannot be more clever than a well designed db kernel.A little off topic, but I have been waiting over a decade for Linux to merge all async waiting into one system call.Wouldn't it be nice if there was a kqueue system call in posix? It would then force Linux to finally implement it.

AceJohnny2over 7 years ago

The author's name, Avi Kivity sounded familiar. Turns he created KVM, the Linux kernel hypervisor.<a href="https://il.linkedin.com/in/avikivity" rel="nofollow">https://il.linkedin.com/in/avikivity</a>

评论 #15412546 未加载

kgoutham93over 7 years ago

Great article for a noob like me. Any similar article suggestions, which gives a good high level overview on kernel internals, specifically on IO topics like what happens when an application requests for a file - right from issuing a system call to disk Interactions.I was always confused - still I am - with regards to different caching layers that involve in an IO operation.

评论 #15415266 未加载

throwaway49392over 7 years ago

There's also a class of functions that act like read/write, but are done completely in the kernel without returning to userspace (sendfile/(vm)splice/copy_file_range).

ameliusover 7 years ago

They seem to target the situation that the storage-to-memory ratio is very high.But this raises the question: isn't this ratio decreasing with cheaper and cheaper RAM these days?(I've seen a lot of systems already moving to completely in-memory databases, taking this to the extreme; so this is actually a reality)

评论 #15413219 未加载

评论 #15414868 未加载

_pmf_over 7 years ago

What are typical use cases for direct I/O? I never stumbled upon anything that used it. Intuitively, I would assume specialized logging applications or databases, but even these seem to use other mechanisms.

评论 #15416110 未加载

TheAlchemistover 7 years ago

A bit off topic, but do anybody have some practical experience with Scylla (using at work) ? What's your feedback on the product ?

评论 #15413755 未加载

评论 #15413542 未加载

评论 #15415277 未加载

10 comments

the8472over 7 years ago

评论 #15412534 未加载

评论 #15412037 未加载

评论 #15413120 未加载

评论 #15413274 未加载

评论 #15412629 未加载

评论 #15412504 未加载

rdtscover 7 years ago

评论 #15414831 未加载

评论 #15411665 未加载

评论 #15412019 未加载

hoytechover 7 years ago

评论 #15445182 未加载

评论 #15413781 未加载

评论 #15415636 未加载

drudru11over 7 years ago

AceJohnny2over 7 years ago

评论 #15412546 未加载

kgoutham93over 7 years ago

评论 #15415266 未加载

throwaway49392over 7 years ago

There's also a class of functions that act like read/write, but are done completely in the kernel without returning to userspace (sendfile/(vm)splice/copy_file_range).

ameliusover 7 years ago

评论 #15413219 未加载

评论 #15414868 未加载

_pmf_over 7 years ago

评论 #15416110 未加载

TheAlchemistover 7 years ago

A bit off topic, but do anybody have some practical experience with Scylla (using at work) ? What's your feedback on the product ?

评论 #15413755 未加载

评论 #15413542 未加载

评论 #15415277 未加载

I/O Access Methods for Linux

10 comments

I/O Access Methods for Linux

10 comments