I, too, read the title as meaning the opposite of what he's actually saying. He's saying that if you force your callers to do the allocation like Golang's io.Reader.read does, rather than doing the allocation yourself and forcing the new allocation on them (the way Python's fileobj.read call does), then they can reuse and pool allocations, improving efficiency substantially under some circumstances.<p>There are another couple of benefits, though:<p>1. A call which allocates can necessarily fail, although in garbage-collected languages like Golang, memory allocation failure are normally handled implicitly with something like panic(). (That's because the main benefit of garbage collection is not that you don't have to debug segfaults, but that you can treat arbitrary-sized data structures as if they were atomic integers — with immutability, even when they share structure. This allows you to program at a higher level. You lose this benefit if you have to know which calls allocate and which do not.)<p>1½. By the same token, dynamic allocation almost never takes a deterministic or bounded amount of runtime, though see below about per-frame heaps.<p>2. In many cases, you can do the allocation on the stack rather than the heap, with potentially substantial implications for garbage-collection efficiency. (You could see the MLKit's region allocation as an automatic version of this pattern.)<p>3. In environments like C++, it may make sense to use a different custom allocator, like the per-frame heap commonly used in game engines, or the pool allocator provided by the APR, or an allocator that allocates from a memory-mapped object database like ObjectStore. The alternative is something like the rarely-used allocator template parameter that makes all our STL compile errors so terrible. Even in ordinary C programs, it might make sense to use a static allocation for these things some of the time.<p>On the other hand, putting the allocation on the caller ossifies the amount of space needed as part of your API, and prevents that space from being dynamically determined at runtime by the algorithm. So there are times when letting the caller allocate all the space is undesirable or even impossible. And, as Cheney points out, it does make your API more of a hassle to invoke — but that's easily fixed with a wrapper function that does the allocation.<p>A third alternative, used for example by Numpy and realloc(), is the one mentioned in <a href="https://news.ycombinator.com/item?id=20888920" rel="nofollow">https://news.ycombinator.com/item?id=20888920</a> (though it erroneously implies that C has optional parameters) — have an optional buffer argument telling where to store the result, but if it's not passed in, dynamically allocate a buffer and return it. Taking advantage of this parameter to avoid extra allocations and cache misses commonly doubles the speed of Numpy code, in my experience, but it really hurts readability. As a general API design technique, this seems like the most constraining to future implementations, and I think it makes more sense to separate the underlying stable non-allocating API from the allocating helper wrapper, as described above.