TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Optimization Tricks used by the Lockless Memory Allocator

60 点作者 jcsalterego超过 13 年前

2 条评论

scott_s超过 13 年前
This is very similar to a multithreaded memory allocator I implemented many years ago: <a href="http://people.cs.vt.edu/~scschnei/streamflow/" rel="nofollow">http://people.cs.vt.edu/~scschnei/streamflow/</a><p>Full paper: <a href="http://people.cs.vt.edu/~scschnei/papers/ismm06.pdf" rel="nofollow">http://people.cs.vt.edu/~scschnei/papers/ismm06.pdf</a> I also have an incomplete extension to that paper with more algorithmic details. I can clean it up and send it to interested parties.<p>Source code: <a href="http://github.com/scotts/streamflow/" rel="nofollow">http://github.com/scotts/streamflow/</a><p>There's also an excellent paper recently published at PACT 2011 that is extends and improves on what we did previously, but the paper is not available anywhere yet: <a href="http://aces.snu.ac.kr/Center_for_Manycore_Programming/SFMalloc.html" rel="nofollow">http://aces.snu.ac.kr/Center_for_Manycore_Programming/SFMall...</a> I have a copy of the paper and can email it to interested people.<p>Edit after looking through his code: I think he has locks on the main execution path for malloc. That violates one of our design principles, which was to avoid synchronization on the main path, and to only use lock-free synchronization for most operations on the main path. (Allocations that hit the page-block allocation in our work hit a lock. The SFMalloc allocator I mention above takes those locks out and sees performance improvement.)
评论 #3241883 未加载
huhtenberg超过 13 年前
Lockfree slab allocator for <i>smaller blocks</i> is frequently all that's needed, especially in STL-heavy C++ code.<p>There is typically heck of a lot allocations of 0-32 byte blocks, about half of that of 33-64, a further half of that of 65-128, etc. - meaning that if one optimizes allocation of blocks smaller than 512 bytes, it would yield 80-90% of achievable speed gain due to a better allocator.<p>In practical terms it translates into simpler implementation - just set up a bunch of slab buckets - one for 0-16 byte blocks, next - for up to 32, next - for up to 63, etc. - and pass larger allocation requests to default malloc. Exact size ranges are easy to determine by running the app with a proxy allocator and looking at the block size histogram. I did this with several projects and in all cases histograms stabilized very quickly. The tricky part is a lock-free management of the slabs, especially the disposal, but it is not a rocket science by any means.<p>A real-world example - adding a lock-free slab allocator to a Windows app that did some multi-threaded data crunching yielded 400% speed up compared to native HeapAlloc().