TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

The Parallelism Blues: when faster code is slower

36 点作者 bibyte大约 5 年前

7 条评论

longemen3000大约 5 年前
In Julia, where the paralleization options are explicit (SIMD, AVX, threads or multiprocessing), it always depends on the load, for small operation (around 10000 elements) a single thread is faster only for the thread spawning time (around 1 microsecond). And there is the issue of the independent Blas threaded model, where the Blas threads sometimes interfere with Julia threads... In a nutshell, parallelization is not a magical bullet, but is a good bullet to have at your disposal anyway
评论 #22438817 未加载
评论 #22436740 未加载
AzN1337c0d3r大约 5 年前
There&#x27;s also the problem of Turbo Boost.<p>My laptop&#x27;s 9980HK will boost to ~4.5 GHz when only loaded to a single core.<p>However, when I load up all 8 cores, it might only sustain ~3.5 GHz.<p>Therefore the 8 cores might not actually result in the work being completed 8 times as fast, only 6.2x (8*[3.5&#x2F;4.5]) real-time due to the lowered clock rate of each individual core.<p>This will show up as additional user time, since each individual core is able to do less work for each unit of time (seconds) compared to the single-core case.
maweki大约 5 年前
&quot;It would be extremely surprising, then, if running with N threads actually gave ×N performance.&quot;<p>Basically impossible by Ahmdal&#x27;s law.
评论 #22437048 未加载
评论 #22436978 未加载
_bxg1大约 5 年前
None of this is surprising, right? Unless your system has fewer threads than cores (which it probably doesn&#x27;t even without your program) there will always be some context-switching overhead. It&#x27;s worth keeping in mind I guess - especially the fact that numpy parallelizes transparently - but generally these results are to be expected.<p>The title is also misleading; it suggests that the <i>wall clock</i> time might be longer for parallel code in certain cases. While not impossible, that isn&#x27;t what the article covers.
评论 #22437248 未加载
ncmncm大约 5 年前
The article uses the term &quot;parallelism&quot; when it is talking, instead, about concurrency.<p>Parallelism is specifically the stuff that actually does happen completely independently on all processing units, that actually goes Nx as fast on N units (clock depression aside). Concurrency refers to the overhead of coordinating activity of those units, that keeps you from getting your Nx. It is overhead on top of any actually serial parts of the computation, which Amdahl&#x27;s law addresses.<p>In other words: Parallelism giveth, and concurrency taketh away.<p>The distinction gets more useful the more you think about the subject.
yaroslavvb大约 5 年前
On 64 core Xeon E5 this example gives 1.8x increase in user time, but 8x decrease in wall-clock time
wtracy大约 5 年前
This is perfectly normal behavior when Intel Hyperthreading is involved.<p>I&#x27;m on my phone, so rather than trying to type out an explanation, I&#x27;m going to link to Wikipedia: <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Hyper-threading" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Hyper-threading</a>
评论 #22436876 未加载