TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Python's Hardest Problem (2012)

97 pointsby hartleybrodyalmost 12 years ago

13 comments

pekkalmost 12 years ago
This was FUD when it was posted and now it is age-old FUD which there's no point in re-posting.<p>Jython and IronPython do not have a GIL. Multiprocessing avoids the GIL. Blocking on I/O gives up the GIL. There are all kinds of techniques used instead of throwing threads naively at every problem. And, conveniently, none of this is mentioned in the article. Either the author was not aware of these basic facts, or suppressed them.<p>It is blatantly false that "no single issue has caused more frustration or curiosity for Python novices and experts alike than the Global Interpreter Lock." The author may consider it important, but this does not mean that author is speaking for everyone else.<p>Novices would have good reason to avoid shared-everything threading, which introduces piles of race conditions and difficulty controlling runaway threads, and should try simpler tools first and see whether they can get good results instead of prematurely optimizing with techniques they don't know how to use.<p>Experts will know that the GIL is often not a primary concern, and where it actually is a concern they'll be conversant with other tools like multiprocessing and task queues.<p>The people with the most to say about the GIL are mediocre programmers who want to show off that they are so good Python is limiting them, and people not very familiar with Python (possibly with background in languages which try to make threads the answer to everything) who have an axe to grind.<p>Instead of asking how it is to do what they want to do, they just assume that the problem is the GIL and there is no solution, then expect to be praised for their technical acumen. People with technical acumen just solve the problem in any of the available ways instead of bitching in public about how it's the tool's fault they can't solve the problem because they have defined the problem incorrectly and insist on some arbitrary way of doing it.
评论 #5816363 未加载
评论 #5816782 未加载
评论 #5816382 未加载
评论 #5817121 未加载
评论 #5817619 未加载
revelationalmost 12 years ago
I've written native C modules interacting with EVE online, probably one of the biggest Python systems in deployment, and they run on stackless python with a bazillion of threads for every little thing. While still needing to hit that 50fps window.<p>The GIL was not a problem.
评论 #5816702 未加载
评论 #5816383 未加载
estebankalmost 12 years ago
&#62; "And why has no one attempted something like this before?"<p>"However, if you've been around the Python community long enough, you might also know that the GIL was already removed once before--specifically, by Greg Stein who created a patch against Python 1.4 in 1996." (Also mentioned in the OP)<p>More info can be seen at <a href="http://dabeaz.blogspot.nl/2011/08/inside-look-at-gil-removal-patch-of.html" rel="nofollow">http://dabeaz.blogspot.nl/2011/08/inside-look-at-gil-removal...</a>
评论 #5816685 未加载
gizmo686almost 12 years ago
&#62;using multiple threads to increase performance is at best a difficult task.<p>This isn't completely true. If you are doing anything non CPU bound, using threads is trivial, as the GIL will allow you to perform IO in parallel.
评论 #5816144 未加载
rdtscalmost 12 years ago
&#62; Due to the design of the Python interpreter, using multiple threads to increase performance is at best a difficult task. At worst, it will decrease (sometimes significantly) the speed of your program.<p>Nope. The writer sounds misinformed and is spreading FUD.<p>I have successfully used Python's threads to perform concurrent database fetches, http page getters, file uploads in parallel. Yes, there was almost linear speedup.<p>If you listen to this story it sounds like Guido and most other talented and smart Python contributors added threads to Python just to fuck with people heads -- "thread don't work but let's add them anyway! just to mess with them!". Nope they added them because there are many cases when they work.<p>The answer is if you handle concurrent I/O Python's threads will give you good speedup. Threads are real OS threads and come with nasty side-effects if using shared data structures, but make no mistake you will get the speedup.<p>Your mileage may very and everyone is probably biased and has a different perspective, but where I am coming from in the last 10+ years I have written mostly I/O bound concurrent code. There were very few cases where I hoped to use extra CPU concurrency.<p>Now I did have to do that a couple of times and if you do have that issue, most likely you'd want to descend down to C anyway. Which is what I did. Once in C you can release the lock so Python can process in parallel and your C extension/driver can process in parallel. This is exactly what I did.<p>Now wouldn't it be nice if Python had CPU level concurrency built in. Yes it would be great. But I don't think that is the #1 issue currently. We still don't have 16 cores on most machines.<p><pre><code> #define RANT </code></pre> What worries me is library fragmentation and the new Python 3 adoption (or lack of) now coupled with the new Async IO Future/Promise/Deferred framework introduction. That will harm Python faster and worse than GIL ever did. Adopting and standardizing a Twisted like approach to Async IO will put the nail in Python's coffin. And Guido is certainly marching in that direction. This will fragment existing (already rather fragmented) libraries. Now we'll have Twisted, Tornado, gevent, eventlet, asyncore, Threads, new Promise/Future thingie (anyone know of more?) ways of doing concurrent IO and every time you pick a library (unless you use threads, gevent or eventlet + monkey patching) you will end up choosing a whole new _ecosystem_ of frameworks.<p>I remember for years scouring the web for a Twisted version of an already existing library,because I had made the mistake of picking Twisted as the I/O concurrency framework. Regular library module is available, oh but I need to return a Deferred from it of course, in order for me to use it.<p><pre><code> #undef RANT</code></pre>
评论 #5818508 未加载
评论 #5823575 未加载
评论 #5818654 未加载
评论 #5817992 未加载
aba_sababaalmost 12 years ago
This a great overview - nicely done! It would have been nice to also mention how other implementations of Python like Jython that _don't_ have the GIL, and how they managed to do it.<p>As for why it hasn't been solved yet...the api for threads and processes is pretty much identical. Since you're just as well off using a process in the majority of cases, that's we we go with.
评论 #5816282 未加载
montecarlalmost 12 years ago
This article is written as if shared memory is the only parallel programing paradigm. While it is true, that threads are a very useful construct for writing high performance parallel software, distributed memory programming is also a valid approach.<p>If your problem can map to distributed memory techniques, then you have multiple advantages over shared memory programming. Most importantly you can parallelize over multiple machines. Other advantages include decoupling of each parallel task from each other (fewer race conditions and other hard to debug problems).<p>There are several ways to achieve distributed memory parallelism in Python: multiprocessing, zeromq, raw tcp/ip sockets and mpi4py. Which approach makes sense to use will depend on your problem.
ChuckMcMalmost 12 years ago
Neat discussion on this. Its interesting to look at stuff where threading was front and center (Go, Java, Etc) vs where threading was not (Python, Perl, Etc). The arguments are something you should get an introduction to at least with a CS degree and one of the things people without that explicit teaching develop by feel.<p>Concurrency is the 'tricky bit' of the 'algorithms' pillar [1].<p>[1] The Four arbitrary pillars of computing (algorithms, languages, systems, and data structures) I
评论 #5817839 未加载
peripetylabsalmost 12 years ago
I'm glad I came across this article. I'm learning Python and was given that advice to use multiprocessing rather than threading, but hadn't researched why. Very informative, thanks for sharing.
评论 #5817319 未加载
brass9almost 12 years ago
What about Google's Unladen Swallow project? I'm aware their attempt to remove GIL was aborted. Did any enhancements from those projects find their way into the mainline CPython3k?
conoveralmost 12 years ago
Is there really that much Python code out there that is not IO bound (obviating the GIL)? Scientific computing is the only area that comes to mind where I can imagine problems.
评论 #5817687 未加载
Ihmahralmost 12 years ago
I _love_ to use the Python parallel map function.
MostAwesomeDudealmost 12 years ago
Disagree; Python's hardest problem is packaging, followed closely by bikeshedding over networking libraries and an ever-growing dichotomy between the Python 3 and Python 2 universes of code.<p>You can <i>skip</i> all conversation about the GIL and threads neatly by simply preferring a different concurrency model. There are plenty of ways to do this (see above re: bikesheds and their colors), but being permanently tied to CPython and the threading module is increasingly uncommon for professional Python, and it isn't as unavoidable as things like networking or even <i>which language you're going to use</i>.<p>Edit: I see that the author's in this thread. Nicely written article, but a tad hyperbolic.
评论 #5817120 未加载
评论 #5816793 未加载
评论 #5816979 未加载
评论 #5816735 未加载
评论 #5816755 未加载
评论 #5824301 未加载