> When we free memory, we should make sure that if the block we return to the free list is next to any other free blocks, we combine them together. This is called "coalescing."<p>A little offtopic but the default Delphi 7 memory allocator did this, except that it also merged blocks that it obtained from different OS allocation calls.<p>This worked fine for regular usage, but if that memory was ever used for Bitmaps for UI stuff, it wouldn't work: Since Windows does some of its UI stuff in kernel mode, before doing anything Windows would attempt to lock the entire allocation's VAD entry to prevent you from messing with it in another thread while it was using it. If the Bitmap you were working on happened to belong to two different OS-level allocations, this lock function would fail since it wasn't meant to handle that case.<p>This would lead to random, extremely cryptic errors such as ERROR_NO_SCROLLBARS "The window does not have scroll bars." since the lock call was deep in the stack and the callers replaced the error with another one as it bubbled up.<p>In the end we had to replace the allocator to avoid that issue. That was a fun day I spent debugging that.
The article claims that an allocator that splits memory based on allocation size is called a "buddy allocator". That's misleading: an allocator that allocates an area for each size class is usually called a "slab allocator", while a "buddy allocator" is one that when needed subdivides a memory area with a power of two size into two half-sized areas that are "buddies", does so recursively to satisfy allocations, and coalesces them again when they are free.<p>E.g. the Linux kernel used (not sure if it's still like this) a buddy allocator to allocate pages and power-of-two blocks of pages and slab allocators to subdivide those pages and allocate data structures.<p>Another thing that the article doesn't mention that is important is that most production allocators make use of thread-local storage and either have per-thread caches of free blocks or sometimes whole per-thread memory regions. This is to reduce lock contention and provide memory that is more likely to be in the current core's cache.
This is absolute gold. When I use things like this, I am reminded how powerful digital learning can be. So much more capable then just text or video. But very little content is this well put together. Probably because it is so time intensive.
Oh man, this brings back the days when I wrote special debug-version malloc and free code to try to track down heap corruption due to malloc / free misuse (in code I had contributed to). Stuff like kbyte-long boundary buffers with bit patterns in them, and all sorts of lookaside lists run in parallel with libc's default code. Those bug-detectors worked OK. Hard-nosed code inspection worked far better.<p>As an old-timer in writing code, I think my generation's most-challenging legacies (=== the things we f**ked up) are the non-robust malloc/free concept and null-terminated text strings. Bugs in code using those conventions have been exploitable to a really damaging extent. I learned to program in C from K&R. And getting data-structure code right, and safe to deploy, in that language and its derivatives is HARD.<p>The C inventors are (were in Dennis Ritchie's case) brilliant and Bell Labs was great. Their ideas shaped the the stuff the global internet runs on. But these parts of what thy invented ..... ouch. (All OSs had the same problems.)<p>I wish the wonderful article presented here carried a more prominent warning about this. Many will read it as they learn to code. The history of our profession can teach about what NOT to do as well as what to do.
All credit goes to my wonderful friend Sam Who: <a href="https://twitter.com/samwhoo" rel="nofollow">https://twitter.com/samwhoo</a><p>He recently published a similar guide on load balancing that is equally intuitive and insightful: <a href="https://samwho.dev/load-balancing/" rel="nofollow">https://samwho.dev/load-balancing/</a><p>I can't wait to see what he puts out next!
This is wonderful! I'm definitely going to be sharing this with my students (college sophomores studying CS).<p>If I were to make some suggestions, based on how I know they would receive this:<p>- I would make explicit reference to heap and stack. Students who are learning this material are learning about the heap/stack dichotomy, and I think it would really improve the exposition to make clear that not all memory is allocated this way in a program.<p>- I would remove this line: "At the end of this post, you should know everything you need to know to write your own allocator." I can confidently say that my student will not be able to write an allocator after reading this. It's nothing to do with your piece, it's just the intersection of people who don't understand hex and who could build an allocator after a short blog post is very very small. Students who read this post and at the end still feel like they can't build an allocator will end up discouraged, with a feeling that maybe they are missing something.<p>- Consider rethinking how you handle hex numbers throughout. You introduce them and say they are distinguished by a preceding "0x", but then you immediately omit the 0x to save space in the figure. In my experience, students have <i>a lot</i> of trouble with understanding that hex and dec have the same underlying representation. They will not be able to distinguish between hex and dec bases implicitly, so from a pedagogical standpoint, it's better to be consistent throughout and include the prefix.<p>- Finally at the end I would make mention that there are other strategies for memory management to encourage further exploration. Other languages do it differently, and for students it's important to know which other avenues are out there. Otherwise they might think heap-based malloc/free are the only way to do things, the same way they often thing imperative programming is the only way to program.<p>Anyway, thank you for creating this, and I'm sure it will really help my students. In a just world, "seeing" the memory like this would ideally be first-class tooling for languages like C.
Thank you for this, this is helpful.<p>I wrote a JIT compiler and I didn't bother calling free much, I just let the operating system free up all allocated memory.<p>I got into this situation often:<p><pre><code> return_struct = do_something(mystruct);
return_struct->inner_struct = malloc(sizeof(struct my_inner_struct));
</code></pre>
Now, who owns inner_struct? Who is responsible for freeing it? Do I free it when I assign to it?<p>I feel this ownership complicates cross-language FFI API calls, because who is responsible for freeing structures depends on the application and the platform you're running under. For example, Rust code being called from Erlang. You have to be extra careful when memory is managed by a different language runtime.<p>I feel I am at the beginning of intuitively understanding what memory really is: memory is just a huge contiguous region of numbered locations. Bump allocators allocate in one direction and free all at once. Arena allocators allocate to a preallocated region, I think.<p>Memory is a logistical problem of how you arrange and allocate finite resources.<p>I am thinking of alternative visualizations of understanding memory, for example, I started writing an animation of binary search:<p><a href="https://replit.com/@Chronological/ProgrammingRTS">https://replit.com/@Chronological/ProgrammingRTS</a><p>The idea is that you see values and memory locations move around with the final goal being able to command memory to move around and be computed, such as real time strategy game.<p>I think if we could visualise memory as cars on a road, we would see obvious traffic jams.
Not really related, but the feeling of elation when I alloc'd 1M of RAM in OS/2 and it _didn't_ swap changed me.<p>This was on a 386sx with 8M RAM and it was pretty much all the available memory after the OS was loaded and settled down.<p>A MILLION BYTES!!<p>Didn't do anything with it, but still, after DOS and EMS/XMS memory and all the other falderol of managing memory. (At the time, it was also the only x86 OS that would format a floppy drive without bringing all the other threads to a crawl. UI was still available-ish, BiModem was still BiModeming...
Sam, this is such a wonderful resource that you've put out into the world. Thank you for the time and care you've put into each paragraph and interactive component. You've not only given me a lot to think about in terms of my basic assumptions about memory, but also about how to write and teach better online. I'm really excited to start following you!
Nice article on general-purpose memory allocation approaches. While the article doesn't explicitly discuss it, these all seem to be list-allocation mechanisms?<p>> "List allocators used by malloc() and free() are, by necessity, general purpose allocators that aren’t optimized for any particular memory allocation pattern... To understand, let’s examine commonly used list allocation algorithms: first-fit, next-fit, best-fit and quick-fit."<p>That's from an article on custom memory management (aimed at embedded programming issues) I've found pretty useful, and it picks up where this leaves off:<p><a href="https://www.embedded-software-engineering.de/dynamic-memory-allocation-justifiably-taboo-a-632951/?p=3" rel="nofollow">https://www.embedded-software-engineering.de/dynamic-memory-...</a><p>You can practice writing custom memory managers in an application that runs on an operating system by only using the stack (just create a big array of int etc. and call that your memory space):<p>> "For the safety-critical tasks, the developer can replace the standard allocator with a custom allocator that sets aside a buffer for the exclusive use of that task, and satisfies all memory allocation requests out of that buffer... The remainder of this paper presents four examples of custom allocator algorithms: the block allocator, stack allocator, bitmap allocator and thread-local allocator. Custom memory managers can use any one of these algorithms or a combination of different algorithms."
> <i>Others will return what's called a "null pointer", a special pointer that will crash your program if you try to read or write the memory it points to.</i><p>This is not strictly true, it depends on the environment you're using. Some older operating systems and some modern embedded systems have memory mapped at the zero address.
The only thing that confused me is how it said we can know the location of the block after and before by calculating:<p><pre><code> address + <value at address>
address - <value at address-1>
</code></pre>
Shouldn't this be?<p><pre><code> address + <value at address> + 3
address - <value at address-1> - 3</code></pre>
> "There's no shortage of information about memory allocators on the Internet, and if you've read this far you should be well-placed to dive in to it.
Join the discussion on Hacker News! <a href="https://news.ycombinator.com/item?id=36029087" rel="nofollow">https://news.ycombinator.com/item?id=36029087</a> "<p>Interesting to use hacker news as the blog's own comment section in this way.
Seems to be a bug on the first interactive graph, at least for me. Unless I'm misunderstanding the point of the graph, `malloc(7)` only allocates 2 bytes.
It even has a playground! <a href="https://samwho.dev/allocator-playground/" rel="nofollow">https://samwho.dev/allocator-playground/</a><p>How I wish I had something like that when I first learned C.
Ahh yes. Back at university we had a class called efficient programming where we had to implement a Unix utility and optimize it for speed, meaning cpu cycles.<p>While aggressively optimizing we replaced malloc in the end with a function we called "cooloc", that traded portability and security for speed. The debug tool here would have been useful.
When writing C, I tend to avoid calling malloc and free directly.<p>* <a href="https://github.com/DaveJarvis/mandelbrot/blob/master/memory.c">https://github.com/DaveJarvis/mandelbrot/blob/master/memory....</a><p>I then apply this same principle of "opening" and "closing" structures throughout the application. Generally, I can quickly verify that the calls are balanced:<p>* <a href="https://github.com/DaveJarvis/mandelbrot/blob/master/threads.c">https://github.com/DaveJarvis/mandelbrot/blob/master/threads...</a><p>What's nice about this pattern is that the underlying implementation of exactly how the memory is maintained for a particular data structure (e.g., whether malloc or gdImageCreateTrueColor is called) becomes an implementation detail:<p>* <a href="https://github.com/DaveJarvis/mandelbrot/blob/master/image.c">https://github.com/DaveJarvis/mandelbrot/blob/master/image.c</a><p>The main application opens then closes structures in the reverse order:<p>* <a href="https://github.com/DaveJarvis/mandelbrot/blob/master/main.c">https://github.com/DaveJarvis/mandelbrot/blob/master/main.c</a><p>This means there's only one call to malloc and one call to free throughout the entire application (third-party libraries notwithstanding), allowing them to be swapped out with ease.<p>Aside, logging can follow this same idea by restricting where any text is written to the console to a single location in the code base:<p>* <a href="https://github.com/DaveJarvis/mandelbrot/blob/master/logging.c#L30">https://github.com/DaveJarvis/mandelbrot/blob/master/logging...</a>
This is a nice read too (the whole blog actually)<p><a href="http://igoro.com/archive/volatile-keyword-in-c-memory-model-explained/" rel="nofollow">http://igoro.com/archive/volatile-keyword-in-c-memory-model-...</a><p>It's a bit old by now (2010), but I always remembered the mental model Igor created.
I love this so much, thank you for putting this together!<p>My only piece of feedback would be for the "Inline Bookkeeping" section (<a href="https://samwho.dev/memory-allocation/#inline-bookkeeping" rel="nofollow">https://samwho.dev/memory-allocation/#inline-bookkeeping</a>), it took a while for me to grok the numbered list to figure out which block corresponded to address + X. I wonder if there is a better way to visualize the 4 numbered bullet points? Maybe just arrows and text pointing to the visualization?<p>Thanks again for this wonderful article!
Love this. It is so important to know these fundamentals concepts. I would like to see a similar version for SQL database indexes as 99% of the engineers I work with have no idea how to use them.
Great webpage, but it somehow left me more confused.<p>It seems to imply that malloc/free works by boundary tag? Which I don't think is the case? (and if not, it leaves the reader confused as to how it then actually works).<p>I know "some" languages use the tag technique (e.g. julia), but the article seems to imply that this also applies to the c code above? Or am I wrong and c also makes use of such a scheme when you use pointer arithmetic?? (I don't see how that would work with arrays if that's the case though)
Glad there are always talent people and great guide like him/this exist! Great people and guides will help tremendous learners along the way, timeless, priceless.
So good! I wish I had an article like this to learn from when I implemented my language's memory allocator a few months ago! Wonderful, thank you!
I love this! I wish it existed back when I was writing my first memory allocators in university or when building a custom EntityComponentSystem implementation.<p>I'd love to also see applications of custom memory allocations. I know about usecases in building game engines and the importance of hitting cache there, but I'm not sure where else in the world this would be as useful.
Somewhat relevant about how GCs work, with also very great visuals: <a href="https://spin.atomicobject.com/2014/09/03/visualizing-garbage-collection-algorithms/" rel="nofollow">https://spin.atomicobject.com/2014/09/03/visualizing-garbage...</a>
Excellent, excellent article! I have a question though.<p>> <i>Couldn't we rearrange the memory to get a block of 6 contiguous bytes? Some sort of defragmentation process?</i><p>> <i>Sadly not. Remember earlier we talked about how the return value of malloc is the address of a byte in memory? Moving allocations won't change the pointers we have already returned from malloc. We would change the value those pointers are pointed at, effectively breaking them. This is one of the downsides of the malloc/free API.</i><p>But why not? Couldn't we store information about old pointers somewhere and match them with new addresses when defragmenting? Some kind of virtual memory driver that would map old pointers to new adresses transparently for the programs? Or would it be too much overhead for too little benefit?
This is really, really well done. Also, the allocator playground[0] is really cool. Will be my go-to when teaching this topic moving forward :)<p>[0] <a href="https://samwho.dev/allocator-playground/" rel="nofollow">https://samwho.dev/allocator-playground/</a>
I think it will be helpful if it discusses internal fragmentation, where the payload is smaller than the allocated block. I observed that this was important in understanding the purpose of various memory allocation algorithms when undertaking the malloc lab in college.
Can you highlight the specific malloc() calls in the interactive part? It confused me when it said malloc(5) because it looks almost exactly like malloc(4). Specifically highlighting the malloc(5) allocation would make that a bit clearer I think.
This is the future of learning. You have a skill in it, learn to monetize it. Will benefit both of us, as you are more incentivized to do it if you can make money
I really like the website design, above all. I'd love to pen my thoughts online, but haven't had the energy to learn static site generation...
I've played around quite a bit with slab allocation in C to get my interpreters to run faster, and this post inspired me to do a quick benchmark of a design I've been iterating.<p>malloc/free: 207294429ns<p>slab: 74795526ns<p>A 74% reduction in runtime is pretty nice.<p><a href="https://github.com/codr7/libcnabl">https://github.com/codr7/libcnabl</a>
I was having some fun recently just seeing how fast we can allocate large chunks of memory.<p>There's something refreshing about firing up a C (or maybe now Zig, in my case) compiler and allocating a gigabyte of memory, and seeing that your process is using exactly 1GB.
While I do like the article, I wish it simply used multiple images and/or animated gifs (instead of javascript) for the pictorials.<p>It would make the site much more accessible and clear in the event you didn't realize you had to click forward.
It would have been nice if I had seen this article before refactoring a Java project into a C project, where learning memory allocation rules in order to use C was a painful process for me :(
Love the visualizations. It would be great to have `realloc` covered too but I'm not sure how much that varies by system, potentially making it too complicated for this sort of post.
I did work focused on optimizing the latency impacts of mem allocation for a recent client of mine. Loved doing that kind of low-level systems tuning. (Linux, C, mmap and friends.)
The asides with the dog are great. I've seen a few blogs use this technique -- I'm not big on Greek, but I think Plato used this technique too.<p>It instills the idea that you should be asking a question at this point. Some information has been provided that should generate more questions in your head, if you're keeping pace.