TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Cost of a thread in C++ under Linux

194 pointsby eaguyhnabout 5 years ago

12 comments

drmeisterabout 5 years ago
Threads are very expensive if you start throwing C++ exceptions within them in parallel. You see the overall time to join the threads increases with each thread you add. There is a mutex in the unwinding code and as the threads grab the mutex they invalidate each other&#x27;s cache line. I wrote a demo to illustrate the problem <a href="https:&#x2F;&#x2F;github.com&#x2F;clasp-developers&#x2F;ctak" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;clasp-developers&#x2F;ctak</a><p>MacOS doesn&#x27;t have this problem but Linux and FreeBSD do.
评论 #22457159 未加载
评论 #22458852 未加载
评论 #22458415 未加载
评论 #22457133 未加载
评论 #22457535 未加载
评论 #22457032 未加载
评论 #22457994 未加载
boulosabout 5 years ago
I find Eli Bendersky’s writeup [1] more useful as it actually goes closer to the details. For readers less familiar, it also makes it more clear what the time spent will depend on (how much state there is to copy). Eli’s post is actually a sub-post of his “cost of context switching” post [2] which is more often applicable (and helps answer all the questions below about threadpools).<p>[1] <a href="https:&#x2F;&#x2F;eli.thegreenplace.net&#x2F;2018&#x2F;launching-linux-threads-and-processes-with-clone&#x2F;" rel="nofollow">https:&#x2F;&#x2F;eli.thegreenplace.net&#x2F;2018&#x2F;launching-linux-threads-a...</a><p>[2] <a href="https:&#x2F;&#x2F;eli.thegreenplace.net&#x2F;2018&#x2F;measuring-context-switching-and-memory-overheads-for-linux-threads&#x2F;" rel="nofollow">https:&#x2F;&#x2F;eli.thegreenplace.net&#x2F;2018&#x2F;measuring-context-switchi...</a>
评论 #22461130 未加载
bluetomcatabout 5 years ago
For CPU-bound tasks, it is best to pre-create a number of threads whose count roughly corresponds to the number logical execution cores. Every thread is then a worker with a main loop and not just spawn on-demand. Pin their affinity to a specific core and you are as close as possible to the “perfect” arrangement with minimized context switches and core-local cache data being there most of the time.
评论 #22456906 未加载
评论 #22456880 未加载
评论 #22457352 未加载
评论 #22456993 未加载
shin_laoabout 5 years ago
Great reminder.<p>Even if you pre-create a thread (thread pool), when the task is small enough (less than 1,000 cycles), it is less expensive to do it in place (for example, with fibers), because of the cost of context switching.
评论 #22457357 未加载
评论 #22457124 未加载
hrgigerabout 5 years ago
Using taskset pinning my numbers improves:<p>$taskset --cpu-list 8 .&#x2F;costofthread avg: 11000~<p>$taskset --cpu-list 8,11 .&#x2F;costofthread avg: 33000~<p>$.&#x2F;costofthread avg: 60000~
saagarjhaabout 5 years ago
Is a std::thread a thin wrapper around pthreads on Linux?
评论 #22457129 未加载
评论 #22456748 未加载
评论 #22457090 未加载
评论 #22456881 未加载
knownabout 5 years ago
On any architecture, you may need to reduce the amount of stack space allocated for each thread to avoid running out of virtual memory<p><a href="http:&#x2F;&#x2F;www.kegel.com&#x2F;c10k.html#limits.threads" rel="nofollow">http:&#x2F;&#x2F;www.kegel.com&#x2F;c10k.html#limits.threads</a>
评论 #22457207 未加载
评论 #22457054 未加载
评论 #22457770 未加载
isattyabout 5 years ago
Why is there such a big difference in timing between Skylake and Rome? Something compiler specific? The number of steps required to create a thread should be identical.<p>I’ll also be interested to see the same benchmark but using pthread_create directly.
评论 #22457746 未加载
评论 #22456854 未加载
maayankabout 5 years ago
Why the relative high cost of threads on ARM? If anything, I&#x27;d imagine it is more geared towards &quot;massive parallel&quot; scenarios (i.e. dozens of cores).
Koshkinabout 5 years ago
Intel’s excellent TBB library is the answer to all your worries about threads in C++. (IMHO it should be made part of the standard library.)
评论 #22457210 未加载
signa11about 5 years ago
imho, if _cost_ of thread creation is where the bottleneck is, then more likely than not, you are doing things wrong.
评论 #22456814 未加载
brainscdfabout 5 years ago
My personal best practice is to always create a thread pool on program startup and distribute your tasks among the thread pool. I use the same best practice in all other languages too. Is this best practice sound or can it lead to problems in some corner cases?
评论 #22456885 未加载
评论 #22458659 未加载