TechEcho

6 comments

epistasisalmost 13 years ago

I was playing around with some stuff that required a 48GB hash table and, to the very best of my ability to understand this stuff, the run time was completely dominated TLB misses. I say this because, based on my throughput, every lookp was taking the time of about 3 memory accesses on average; i.e. there were page table lookups for every single memory access I made. I don't know the tools that would let me actually monitor the true number of TLB misses.Had I pursued it further, it seems that using a hugepages interface could alleviate this, but hugepages are a royal pain in the ass to get going as they require kernel parameters, rebooting, and special memory allocation routines, and praying that your memory doesn't get fragmented. Of course I was doing this in C, and if my application had been in any other language it may have been extremely difficult to get this to work.My use case may have been unusual, but as we store more and more data in RAM it's going to become less unusual. When we care deeply about the latency it seems that virtual memory pagesize is going to be a big problem, and already it seems that there are few use cases where 4kb pages are large enough.

评论 #4340122 未加载

评论 #4340260 未加载

评论 #4340255 未加载

评论 #4341886 未加载

derefralmost 13 years ago

I was pondering, a while ago, an operating system that--as well as exposing a raw "allocate me a block of memory" function--exposed a managed, typed key-value representation of virtual memory (picture, say, a Redis kernel module), from which one could allocate hashes, trees, linked-lists, and so forth. Given a NUMA architecture, this K-V store could then just be clustered between each memory pool in the same system in exactly the same way (save optimizations) one would cluster it against remote systems.

zippiealmost 13 years ago

Just some background - the solution/benchmark spawned from an index lookup latency issue. In our search engine, we generate enormous b-tree indexes and store them in memory (rsync from master then mmap). After adding more logic, intersects, and unions the search engine started to miss SLA.Eventually, we traced the problem back to the additional latency in the vmalloc code path. The get_free_page* API code path had much lower latency and llds was born (llds uses k*alloc which is a wrapper around GFP).Additional use cases where llds is being used is in low-energy compute environments (like SeaMicro machines) where every CPU cycle is expensive due to increased hardware latency.

justincormackalmost 13 years ago

I remember when there was a webserver in the Linux kernel. However it was considered a bug that you could not do equal performance from userspace and eventually it was removed.Should also be possible to fix for this type of case. Making a kernel module is the easy solution and gives a benchmark though.

gwernalmost 13 years ago

Reminds me of exokernels. Being able to freely roll or adapt your own virtual memory management system tuned to your application was one of the signature uses.

chimmyalmost 13 years ago

i am not able to fully understand what it is shooting for. The README, says it avoids the VM layer (which seems impossible in a pure software solution). The code suggests its merely doing a kmem_cache_zalloc. am i missing something?its true that vm is an overhead now, with infinite/very large memory, the concept of virtual memory is outdated. TLB misses are too high and huge pages just don't cut it. this has been repeated over and over but we need to re-design the VM/hardware to support TLBless access for a portion of memory of the working set size of your primary application.

6 comments

epistasisalmost 13 years ago

评论 #4340122 未加载

评论 #4340260 未加载

评论 #4340255 未加载

评论 #4341886 未加载

derefralmost 13 years ago

zippiealmost 13 years ago

justincormackalmost 13 years ago

gwernalmost 13 years ago

Reminds me of exokernels. Being able to freely roll or adapt your own virtual memory management system tuned to your application was one of the signature uses.

chimmyalmost 13 years ago

Virtual memory overhead for trees

6 comments

Virtual memory overhead for trees

6 comments