TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Parallelising Python with Threading and Multiprocessing

94 pointsby shogunmikeabout 11 years ago

10 comments

bquinlanabout 11 years ago
I&#x27;d like to point out that the Python standard library offers an abstraction over threads and processes that simplifies the kind of concurrent work described in the article: <a href="https://docs.python.org/dev/library/concurrent.futures.html" rel="nofollow">https:&#x2F;&#x2F;docs.python.org&#x2F;dev&#x2F;library&#x2F;concurrent.futures.html</a><p>You can write the threaded example as:<p><pre><code> import concurrent.futures import itertools import random def generate_random(count): return [random.random() for _ in range(count)] if __name__ == &quot;__main__&quot;: with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor: executor.submit(generate_random, 10000000) executor.submit(generate_random, 10000000) # I guess we don&#x27;t care about the results... </code></pre> Changing this to use multiple processes instead of multiple threads is just a matter of s&#x2F;ThreadPoolExecutor&#x2F;ProcessPoolExecutor.<p>You can also write this more idiomatically (and collect the combined results) as:<p><pre><code> if __name__ == &quot;__main__&quot;: with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor: out_list = list( executor.map(lambda _: random.random(), range(20000000))) </code></pre> In this example case, this will be quite a bit slower because the work item (in this case generating a single random number) is trivial compared to the overhead of maintaining a work queue of 200000000 items - but in a more typical case where the work takes more than a millisecond then it is better to let the executor manage the division of labour.
评论 #7692048 未加载
评论 #7691480 未加载
评论 #7692186 未加载
halayliabout 11 years ago
This example is not too realistic and just narrows it down to the case where a job can be divided into isolated tasks with no shared data&#x2F;state.<p>Often times threads need to update shared dict&#x2F;list etc... With multiprocessing this cannot be done. You can use a Queue for this but it&#x27;s horribly inefficient.<p>Generally speaking if you need performance and Python is not meeting the requirements then you are better off using another language.
评论 #7691948 未加载
评论 #7691497 未加载
评论 #7691751 未加载
tiger10guyabout 11 years ago
For the every day when I want to make embarrassingly parallel operations in Python go fast I find joblib to be a pretty good solution. It doesn&#x27;t work for everything, but it&#x27;s quick and simple where it does work.<p><a href="https://pythonhosted.org/joblib/" rel="nofollow">https:&#x2F;&#x2F;pythonhosted.org&#x2F;joblib&#x2F;</a>
评论 #7691504 未加载
zo1about 11 years ago
I&#x27;ve had good success using Celery to parallelize tasks&#x2F;jobs in python.<p>www.celeryproject.org<p>Also, it has a very nice concept called canvas that allows you to chain&#x2F;combine the data&#x2F;results of different tasks together.<p>It also allows you to switch out different implementations of the communication infrastructure that Celery uses to communicate and dish-out tasks.
dekhnabout 11 years ago
For python developers who dislike the continued existence of the GIL in a multicore world, and who feel that multiprocessing is a poor response given the existence proofs of IronPython and Jython as non-GIL interpreter implementations, please consider moving to Julia.<p>Julia addresses nearly all the problems I&#x27;ve found with Python over the years, including poor performance, poor threading support on multicore machines, integration with C libraries, etc. I was a big adherent of Python but as machines got more capable, the ongoing resistence to solving the GIL problem (which IronPython demonstrated can be done with reasonable impact on serial performance) I could not continue using the language except for legacy applications.
评论 #7692123 未加载
评论 #7692240 未加载
wbsunabout 11 years ago
Python threads, aren&#x27;t they just single threaded execution?!
评论 #7692321 未加载
评论 #7692660 未加载
matrixiseabout 11 years ago
Have you seen there is an error in the code for the threading part?<p>the right way if you want to use a thread is thread = threading.Thread(target=CALLABLE, args=ARGS)<p>and not<p>thread = threading.Thread(target=CALLABLE(ARGS))
CraigJPerryabout 11 years ago
For the example task we could use the multiprocessing Pool and (the undocumented) ThreadPool.<p>This implements the worker pool logic already so we don&#x27;t have to.
评论 #7691490 未加载
thikonomabout 11 years ago
For network bound operations Twisted&#x27;s cooperate &#x2F; coiterate come handy.
eudoxabout 11 years ago
Or just use a better language. One is that is actually compiled and fast?
评论 #7691991 未加载