科技回声

16 条评论

singhrac将近 2 年前

I've spent a lot of time writing and debugging multiprocessing code, so a few thoughts, besides the general idea that this looks good and I'm excited to try it:- automatic restarting of workers after N task is very nice, I have had to hack that into places before because of (unresolveable) memory leaks in application code- is there a way to attach a debugger to one of the workers? That would be really useful, though I appreciate the automatic reporting of the failing args (also hack that in all the time)- often, the reason a whole set of jobs is not making any progress is because of thundering herd on reading files (god forbid over NFS). It would be lovely to detect that using lsof or something similar- it would also be extremely convenient to have an option that handles a Python MemoryError and scales down the parallelism in that case; this is quite difficult but would help a lot since I often have to run a "test job" to see how much parallelism I can actually use- I didn't see the library use threadpoolctl anywhere; would it be possible to make that part of the interface so we can limit thread parallelism from OpenMP/BLAS/MKL when multiprocessing? This also often causes core thrashingSorry for all the asks, and feel free to push back to keep the interface clean. I will give the library a try regardless.

milliams将近 2 年前

Why does everyone compare against `multiprocessing` when `concurrent.futures` (<a href="https://docs.python.org/3/library/concurrent.futures.html" rel="nofollow noreferrer">https://docs.python.org/3/library/concurrent.futures.html</a>) has been a part of the standard library for 11 years. It's a much improved API and the are _almost_ no reasons to use `multiprocessing` any more.

评论 #37092728 未加载

评论 #37093622 未加载

评论 #37099178 未加载

评论 #37095263 未加载

评论 #37093818 未加载

trostaft将近 2 年前

The particular pain point of multiprocessing in python for me has been the limitations of the serializer. To that end, multiprocess, the replacement by the dill team, has been useful as a drop in replacement, but I'm still looking for better alternatives. This seems to support dill as an optional serializer so I'll take a look!

jw887c将近 2 年前

Multiprocessing is great as a first pass parallelization but I've found that debugging it to be very hard, especially for junior employees.It seems much easier to follow when you can push everything to horizontally scaled single processes for languages like Python.

评论 #37090981 未加载

评论 #37091283 未加载

评论 #37090968 未加载

评论 #37091422 未加载

miohtama将近 2 年前

Another good library for concurrency and parallel tasks is futureproof:<a href="https://github.com/yeraydiazdiaz/futureproof">https://github.com/yeraydiazdiaz/futureproof</a>> concurrent.futures is amazing, but it's got some sharp edges that have bit me many times in the past.> Futureproof is a thin wrapper around it addressing some of these problems and adding some usability features.

anotherpaulg将近 2 年前

I often use lox for this sort of thing. It can use threads or processes, and has a very ergonomic api.<a href="https://github.com/BrianPugh/lox">https://github.com/BrianPugh/lox</a>

评论 #37094551 未加载

评论 #37091819 未加载

urcyanide将近 2 年前

Some potential issues about Python multiprocessing <a href="https://blog.mapotofu.org/blogs/python-multiprocessing/" rel="nofollow noreferrer">https://blog.mapotofu.org/blogs/python-multiprocessing/</a>. COW is quite tricky. BTW, most of the related official Python docs doesn’t mention the usage under ‘spawn’.

liendolucas将近 2 年前

I've written a very tiny multiprocessing pipeline in Python. It's documented.I've actually never made use of it but at the time I got a bit obsessed and wanted to write it. It does seem to work as expected.Is highly hackable as it is only a single file and a couple of classes.Maybe is useful to someone, here's the link: <a href="https://github.com/lliendo/SimplePipeline">https://github.com/lliendo/SimplePipeline</a>

amelius将近 2 年前

Very cool.Except I'm a bit concerned that it might have too many features. E.g. rendering of progress bars and such. This should really be in a separate package and not referenced from this package.The multiprocessing module might not be great, but at least the maintainers have always been careful about feature creep.

jmakov将近 2 年前

How is this different from ray.io?

评论 #37091028 未加载

IshKebab将近 2 年前

Why has Python never added something like We workers/isolates? That seems like the obvious thing to do but they only have multiprocess hacks.

评论 #37092819 未加载

评论 #37092403 未加载

stainablesteel将近 2 年前

i see that all the benchmarks have processpoolexecutor either equal to or outperforming multiprocessing and i do not find this to be the case for about 90% of my cases.also a niche question, is this able to overcome the inability to pickle a function within another function to multiprocess it?i'm still excited to try this as i haven't heard of it and good multiprocessing is hard to come by.

captaintobs将近 2 年前

Why is this faster than the stdlib? What does it do to achieve better performance?

评论 #37091187 未加载

darkteflon将近 2 年前

Would anyone be in a position to comment on how this compares to Dask?

MitPitt将近 2 年前

Always dreamed of multiprocessing with tqdm, this is great

bee_rider将近 2 年前

Ah darn, was hoping for some MPI Python interface.

16 条评论

singhrac将近 2 年前

milliams将近 2 年前

评论 #37092728 未加载

评论 #37093622 未加载

评论 #37099178 未加载

评论 #37095263 未加载

评论 #37093818 未加载

trostaft将近 2 年前

jw887c将近 2 年前

评论 #37090981 未加载

评论 #37091283 未加载

评论 #37090968 未加载

评论 #37091422 未加载

miohtama将近 2 年前

anotherpaulg将近 2 年前

I often use lox for this sort of thing. It can use threads or processes, and has a very ergonomic api.<a href="https://github.com/BrianPugh/lox">https://github.com/BrianPugh/lox</a>

评论 #37094551 未加载

评论 #37091819 未加载

urcyanide将近 2 年前

liendolucas将近 2 年前

amelius将近 2 年前

jmakov将近 2 年前

How is this different from ray.io?

评论 #37091028 未加载

IshKebab将近 2 年前

Why has Python never added something like We workers/isolates? That seems like the obvious thing to do but they only have multiprocess hacks.

评论 #37092819 未加载

评论 #37092403 未加载

stainablesteel将近 2 年前

captaintobs将近 2 年前

Why is this faster than the stdlib? What does it do to achieve better performance?

评论 #37091187 未加载

darkteflon将近 2 年前

Would anyone be in a position to comment on how this compares to Dask?

MitPitt将近 2 年前

Always dreamed of multiprocessing with tqdm, this is great

bee_rider将近 2 年前

Ah darn, was hoping for some MPI Python interface.

Mpire: A Python package for easier and faster multiprocessing

16 条评论

Mpire: A Python package for easier and faster multiprocessing

16 条评论