Needed when executing 1 million API requests for example. Use the full force of your machine by utilising async IO via the excellent library Trio.<p>It has been tested in the wild with more than 10 million requests. Automatically handles errors and executes retries.<p>Also provides helper functions for executing embarrassingly parallel async coroutines.
I'm curious why one would suggest using this over the built-in ThreadPoolExecutor class?<p>Here's a complete, similar example from the Python docs: <a href="https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor-example" rel="nofollow">https://docs.python.org/3/library/concurrent.futures.html#th...</a>.<p>Granted that example is more lines of code than the README for this library. But the core of it is fairly straightforward:<p><pre><code> rsps = []
with concurrent.futures.ThreadPoolExecutor(max_workers=N) as executor:
futures = [executor.submit(requests.get, url) for url in urls]
for future in concurrent.futures.as_completed(futures):
rsps.append(future.result)</code></pre>
Oh, handy. This would have solved the exact problem I had last month (which I solved with just-good-enough fiddling with async python, which I'm not comfortable in yet)
I have a somewhat related question:<p>What is the current best practice way to parallelize reading many parquet files on S3 using pyarrow? Deep down, it's just lots of HTTP requests.