Dismissing Python Garbage Collection at Instagram

258 pointsby shivawuover 8 years ago

23 comments

chubotover 8 years ago

This is a great post. I like how they walked through all the steps and especially the "perf" tool.Ruby has a patch to do the same thing -- increase sharing by moving reference counts out of the object itself:Index here:<a href="http://www.rubyenterpriseedition.com/faq.html#what_is_this" rel="nofollow">http://www.rubyenterpriseedition.com/faq.html#what_is_this</a>First post in a long series:<a href="http://izumi.plan99.net/blog/index.php/2007/07/25/making-rubys-garbage-collector-copy-on-write-friendly/" rel="nofollow">http://izumi.plan99.net/blog/index.php/2007/07/25/making-rub...</a>I think these patches or something similar may have made it into Ruby 2.0:<a href="http://patshaughnessy.net/2012/3/23/why-you-should-be-excited-about-garbage-collection-in-ruby-2-0" rel="nofollow">http://patshaughnessy.net/2012/3/23/why-you-should-be-excite...</a><a href="https://medium.com/@rcdexta/whats-the-deal-with-ruby-gc-and-copy-on-write-f5eddef21485#.10aa2bnnw" rel="nofollow">https://medium.com/@rcdexta/whats-the-deal-with-ruby-gc-and-...</a>The Dalvik VM (now replaced by ART) also did this to run on phones with 64 MiB of memory:<a href="https://www.youtube.com/watch?v=ptjedOZEXPM" rel="nofollow">https://www.youtube.com/watch?v=ptjedOZEXPM</a>I think PHP might do it too. It feels like Python should be doing this as well.

stcredzeroover 8 years ago

It's basically "cheating" at GC by exploiting a very narrow use case. I saw a trick like this at Smalltalk Solutions in 2000 with a 3D game debugging tool. The "GC" actually simply threw everything away for each frame tick.Someone needs to come up with something like a functional language based on a trick like this. Or maybe a meta-language akin to RPython, so people can write domain specific little languages for doing things like serving web requests, combined with domain specific "cheating" GC that can get away with doing much less work than a full general purpose GC.Couldn't a pure functional programming environment be structured to allow for such GC "cheating?"

评论 #13426618 未加载

评论 #13429649 未加载

评论 #13431117 未加载

评论 #13428900 未加载

评论 #13425950 未加载

评论 #13430916 未加载

评论 #13429004 未加载

评论 #13430657 未加载

评论 #13428733 未加载

评论 #13429905 未加载

seangroggover 8 years ago

I find Instragram's engineering blog to be really awesome (I especially like their content on PostgreSQL). As well, it seems like they managed to implement a solid solution to a problem they were facing.That being said, I wonder if their team considered implementing a different language that was meant to work without GC overhead. I'm all for working with something you're familiar with, but this seems like they've hit the point where they know enough of the problem surface area that they should be able to start optimizing for more than just 10% efficiency by turning off a selling point of safer languages.

评论 #13432077 未加载

评论 #13430065 未加载

elvinyungover 8 years ago

Nice. I worked on something like this at an internship. I wrote a Unicorn-like preload-fork multiprocess server in Ruby (for other reasons).I realized that the workload (which involved a large amount of long-lived static data on the heap) would have seen enormous memory savings, if only we weren't running with Ruby 1.9's mark-and-sweep GC algorithm that marked every object during the mark phase.I briefly experimented with turning off GC and periodically killing workers. Thankfully, in that situation, all we actually had to do was upgrade to Ruby 2.2, which does have a proper CoW-friendly incremental GC algorithm.`fork` is awesome.

kbdover 8 years ago

One of their issues was that Python runs a final GC call before process exit. Why does Python run that final GC call if the process is exiting anyway?

评论 #13429851 未加载

评论 #13433587 未加载

bsaulover 8 years ago

Is it just me, or does it look like the typical example of short term hack that will blow up in your face pretty quickly, and turn your life in a constant stream of low-level tinkering ?I suppose people at instagram didn't just stop there, but are also planning for more long term solution to optimizing their stack ( aka migration to a more performant language).

评论 #13431392 未加载

评论 #13429026 未加载

n00b101over 8 years ago

> Instagram can run 10% more efficientlySeems quite risky/costly for a mere 10% computational efficiency gain. If you're going to change the memory model of a programming language, might as well shoot for 10x improvement instead of 10%.

评论 #13429352 未加载

bitwizeover 8 years ago

Fun fact: Lisp originally had no GC. It just allocated and allocated memory till there was none left, and then it died, after which the user dumped their working heap to tape, restarted Lisp, and loaded the heap back from tape. Since only the "live" objects were actually written, the heap took up less memory than before and the user could keep going.

Animatsover 8 years ago

Instagram’s web server runs on Django in a multi-process mode with a master process that forks itself to create dozens of worker processes that take incoming user requests.So this is all a workaround for Python's inability to use threads effectively. Instead of one process with lots of threads, they have many processes with shared memory.

评论 #13433427 未加载

eugenekolo2over 8 years ago

Noting that some other library might call `gc.enable()` is correct. But, then ignoring the fact that another library can simply call `gc.set_threshold(n > 0)` seems like an obvious bug in the waiting, and the same issue as something calling `gc.enable()`

jondotover 8 years ago

This is called out of band GC. We've been doing it for years in Ruby with Unicorn <a href="https://blog.newrelic.com/2013/05/28/unicorn-rawk-kick-gc-out-of-the-band/" rel="nofollow">https://blog.newrelic.com/2013/05/28/unicorn-rawk-kick-gc-ou...</a>However when the ruby community moved to Puma which is based on both processes and threads it was needed less. Not that this is rocket science (it's still far behind the JVM and .NET), I assume a hybrid process/thread model is something that hadn't reached a critical mass in the Python/Django/Flask/Bottle community?

评论 #13431797 未加载

kilinkover 8 years ago

They mentioned msgpack was calling gc.enable(), but it looks like that issue was fixed quite a while ago in version 0.2.2:<a href="https://github.com/msgpack/msgpack-python/blob/2481c64cf162d765bfb84bf8e85f0e9861059cbc/ChangeLog.rst#bugs-fixed-10" rel="nofollow">https://github.com/msgpack/msgpack-python/blob/2481c64cf162d...</a>

placeybordeauxover 8 years ago

This writing feels a little sloppy> At Instagram, we do the simple thing first. [...] Lesson learned: prove your theory before going for it.So do they no longer do the simple thing first?More on topic: this seems like they optimized something in a way that might really constrain them down the road. Now if anyone creates an object that isn't covered by ref-counting they will get OOMs.

jsmeatonover 8 years ago

Carl Myer, a Django core dev, presented at Django under the hood on using Django at instagram. It was a really good talk that goes through how they scaled and what metrics they use for measuring performance. <a href="https://youtu.be/lx5WQjXLlq8" rel="nofollow">https://youtu.be/lx5WQjXLlq8</a>

rcthompsonover 8 years ago

I actually didn't know that CPython had a way of breaking reference cycles. I seem to remember reading that reference counting was the only form of garbage collection that CPython did. Maybe this was the case in the past?

评论 #13430548 未加载

theossuaryover 8 years ago

> Each CoW triggers a page fault in the process.Maybe I misunderstood how page faults work, but I thought this process was reversed. I.e. Each page fault triggers a CoW, not the other way around?

fulafelover 8 years ago

It could help to colocate all the refcounts in a contiguous block of memory, column store style. You would nly get a page fault per 1024 objects.

zaptheimpalerover 8 years ago

I'm confused - doesn't a worker run out of memory if GC is disabled?

评论 #13423272 未加载

评论 #13423223 未加载

Binoover 8 years ago

This is actually very clever, and really solves their problem pretty neat!

BerislavLopacover 8 years ago

Using threading to handle user requests with Python seems very wrong to me. They might see solid improvement by ditching WSGI and employing a non-blocking solution (like Tornado, aiohttp or Sanic), running on PyPy as multiple instances behind a load balancer.

评论 #13429857 未加载

gigatexalover 8 years ago

I didn't know about atop or perf profiling. Cool write up.

nrjdhsbsidover 8 years ago

Instead of a bunch of hacks that are obviously going to blow up in someone's face one day why not just use a more suitable platform?Forking threads for web pages is so old school...And Python is a terrible choice for something at their scale.Just redo the hosting bit in Java or golang and call it a day. If their UI code is sufficiently isolated from the back end it's not a huge deal.Instagram is a pretty small application feature-wise, a few devs could probably do it in a couple months

评论 #13429512 未加载

评论 #13429252 未加载

评论 #13429553 未加载

评论 #13429163 未加载

discodaveover 8 years ago

If you think about it, this approach is actually very similar to the FaaS/Lambda/Serverless model. Each request lives in its own little container which gets thrown away after every execution. This approach means you reduce the amount of shared state and lots of problems like garbage collection either get easier or go away.

评论 #13430564 未加载