Looks good, but more work is needed.<p>The case of a suboptimal memory allocator - is the problem of shared linking.
Ideally, we should make a "self-contained" shared library. It means that every dependent library, including malloc, is statically linked into it, and the set of exported symbols is limited to a minimum. But I'm unsure if it will work with Python - never used pybind.