With C/C++, one can get A LOT OF performance by writing custom memory allocator that fit certain usage patterns for large scale App.<p>I designed a custom allocator before new/delete operator overload for C++ OO app. You can think of the app like MS Word, when you open/create a new doc, one need a lot of malloc(). In my case it usually between a few millions to a few tens/hundreds billion records.<p>There were a lot of overhead for standard new/delete. After profiling, I end up writing my own allocator with the following property:<p>* It malloc 1,2,4,8,16,32,64MB at a time. (progressively increase to optimize the app RAM footprint for small and large doc use case.<p>* All the large block alloc()s are associated with the "Doc/DB".<p>* When the Doc close, the only freeing a few large block are needed. This change make the doc/db close operations go from 30+ seconds for large Doc/DB to less than 1 seconds.<p>* I later modified the allocate to get the large block memory directly from a mmap() call. All the memory return are automatically persistent. The save operation also went from 30+seconds for large multiple GB DB to < 1 seconds. (Just close the file and the OS handle all the flushing, etc.)<p>Without ability to customize memory allocator + pointer manipulation, I can't figure how to get similar performance for similar type of large scale app with Golang, Java, etc.