TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How to share a NumPy array between processes

67 点作者 jasonb05超过 1 年前

5 条评论

westurner超过 1 年前
Though deprecated probably in favor of more of a database&#x2F;DBMS like DuckDB, the arrow plasma store holds handles to objects as a separate process:<p><pre><code> $ plasma_store -m 1000000000 -s &#x2F;tmp&#x2F;plasma </code></pre> Arrow arrays are like NumPy arrays but they&#x27;re made for zero copy e.g. IPC Interprocess Communication. There&#x27;s a dtype_backend kwarg to the Pandas DataFrame constructor and read_ methods:<p>df = pandas.Dataframe(dtype_backend=&quot;arrow&quot;)<p>The Plasma In-Memory Object Store &gt; Using Arrow and Pandas with Plasma &gt; Storing Arrow Objects in Plasma <a href="https:&#x2F;&#x2F;arrow.apache.org&#x2F;docs&#x2F;dev&#x2F;python&#x2F;plasma.html#storing-arrow-objects-in-plasma" rel="nofollow noreferrer">https:&#x2F;&#x2F;arrow.apache.org&#x2F;docs&#x2F;dev&#x2F;python&#x2F;plasma.html#storing...</a><p>Streaming, Serialization, and IPC &gt; <a href="https:&#x2F;&#x2F;arrow.apache.org&#x2F;docs&#x2F;python&#x2F;ipc.html" rel="nofollow noreferrer">https:&#x2F;&#x2F;arrow.apache.org&#x2F;docs&#x2F;python&#x2F;ipc.html</a><p>&quot;DuckDB quacks Arrow: A zero-copy data integration between Apache Arrow and DuckDB&quot; (2021) <a href="https:&#x2F;&#x2F;duckdb.org&#x2F;2021&#x2F;12&#x2F;03&#x2F;duck-arrow.html" rel="nofollow noreferrer">https:&#x2F;&#x2F;duckdb.org&#x2F;2021&#x2F;12&#x2F;03&#x2F;duck-arrow.html</a>
评论 #37304260 未加载
arjvik超过 1 年前
Good content, but over SEO optimized. Would be nice to hear about the actual efficiency of using these methods.<p>For instance, does fork() copy the page of memory containing the array? I believe it&#x27;s Copy-on-Write semantics, right? What happens when the parent process changes the array?<p>Then, how do Pipe and Queue send the array across processes? Do they also pickle and unpickle it? Use shared memory?
评论 #37304354 未加载
评论 #37304884 未加载
pplonski86超过 1 年前
I was searching for similar article. Im working on AutoML python package where I use different packages to train ML models on tabular data. Very often the memory is not properly released by external packages so the only way to manage memeory is to execute training in separate processes.
KeplerBoy超过 1 年前
Actually ran into this problem this week, toyed around with multiprocessing.shared_memory (which seems to also rely on mmaped-files, right?) and decided to just embrace the GIL.<p>Multiprocessing is not needed when all of your handful subprocesses are just calling Numpy-code and release their gil anyways.<p>Also some&#x2F;most Numpy functions are multithreaded (depending on the BLAS implementation, linked against), take advantage of that and schedule huge operations and just let the interpreter sit idle waiting for that result.
评论 #37304918 未加载
schoetbi超过 1 年前
There is also Apache Arrow that uses a similar use case. Maybe this is worth considering: <a href="https:&#x2F;&#x2F;arrow.apache.org&#x2F;docs&#x2F;python&#x2F;memory.html#on-disk-and-memory-mapped-files" rel="nofollow noreferrer">https:&#x2F;&#x2F;arrow.apache.org&#x2F;docs&#x2F;python&#x2F;memory.html#on-disk-and...</a>