"What's possible, though, is accumulating Python objects in memory and keeping strong references to them12. For instance, this happens when we build a cache (for example, a dict) that we never clear. The cache will hold references to every single item and the GC will never be able to destroy them, that is, unless the cache goes out of scope."<p>Back when I worked with a Java memory profiling tool (JProbe!) we called these "lingerers". Not leaks, but the behaviour was similar.
The JVM community tends to prefer pure Java implementations of everything, rather than using existing C libraries like Python and Ruby. Some may see this as a bad thing, but it definitely has its benefits. One particularly relevant benefit in the context of this article is that the amount of code that can leak memory, in the conventional sense, is dramatically reduced. I suppose the same thing is happening in the Node.js ecosystem, though I don't recall if Node uses native code to parse XML.
tldr; libxml2's C implementation leaked memory, author tracked it down. Kudos to the author for their persistence in digging down to the root of the problem. A lot of people would throw their hands up and decide to recycle the process every <N> seconds rather than analyze it to the depth the author did.
> "But if we're strictly speaking about Python objects within pure Python code, then no, memory leaks are not possible - at least not in the traditional sense of the term. The reason is that Python has its own garbage collector (GC), so it should take care of cleaning up unused objects."<p>I have a hard time beliving this. Java can have memory leaks so why couldn't Python?
I've used the gc module, with get_referers and get_referents, to track down various leaks. This only really helps with python-allocated object.<p>It's trivial to end up with an unexpected strong reference. Weak references are the right way to deal with cache objects, imho.
Reminds me of myself tracing a memory leak in a node app loading a core dump into an IllumOS VM with mdb_v8. Not so simple/friendly/happy after all.<p>(You could argue that you could generate a heap snapshot with v8-profiler but I was against time).