TechEcho

12 comments

drmeisterabout 5 years ago

Threads are very expensive if you start throwing C++ exceptions within them in parallel. You see the overall time to join the threads increases with each thread you add. There is a mutex in the unwinding code and as the threads grab the mutex they invalidate each other's cache line. I wrote a demo to illustrate the problem <a href="https://github.com/clasp-developers/ctak" rel="nofollow">https://github.com/clasp-developers/ctak</a>MacOS doesn't have this problem but Linux and FreeBSD do.

评论 #22457159 未加载

评论 #22458852 未加载

评论 #22458415 未加载

评论 #22457133 未加载

评论 #22457535 未加载

评论 #22457032 未加载

评论 #22457994 未加载

boulosabout 5 years ago

I find Eli Bendersky’s writeup [1] more useful as it actually goes closer to the details. For readers less familiar, it also makes it more clear what the time spent will depend on (how much state there is to copy). Eli’s post is actually a sub-post of his “cost of context switching” post [2] which is more often applicable (and helps answer all the questions below about threadpools).[1] <a href="https://eli.thegreenplace.net/2018/launching-linux-threads-and-processes-with-clone/" rel="nofollow">https://eli.thegreenplace.net/2018/launching-linux-threads-a...</a>[2] <a href="https://eli.thegreenplace.net/2018/measuring-context-switching-and-memory-overheads-for-linux-threads/" rel="nofollow">https://eli.thegreenplace.net/2018/measuring-context-switchi...</a>

评论 #22461130 未加载

bluetomcatabout 5 years ago

For CPU-bound tasks, it is best to pre-create a number of threads whose count roughly corresponds to the number logical execution cores. Every thread is then a worker with a main loop and not just spawn on-demand. Pin their affinity to a specific core and you are as close as possible to the “perfect” arrangement with minimized context switches and core-local cache data being there most of the time.

评论 #22456906 未加载

评论 #22456880 未加载

评论 #22457352 未加载

评论 #22456993 未加载

shin_laoabout 5 years ago

Great reminder.Even if you pre-create a thread (thread pool), when the task is small enough (less than 1,000 cycles), it is less expensive to do it in place (for example, with fibers), because of the cost of context switching.

评论 #22457357 未加载

评论 #22457124 未加载

hrgigerabout 5 years ago

Using taskset pinning my numbers improves:$taskset --cpu-list 8 ./costofthread avg: 11000~$taskset --cpu-list 8,11 ./costofthread avg: 33000~$./costofthread avg: 60000~

saagarjhaabout 5 years ago

Is a std::thread a thin wrapper around pthreads on Linux?

评论 #22457129 未加载

评论 #22456748 未加载

评论 #22457090 未加载

评论 #22456881 未加载

knownabout 5 years ago

On any architecture, you may need to reduce the amount of stack space allocated for each thread to avoid running out of virtual memory<a href="http://www.kegel.com/c10k.html#limits.threads" rel="nofollow">http://www.kegel.com/c10k.html#limits.threads</a>

评论 #22457207 未加载

评论 #22457054 未加载

评论 #22457770 未加载

isattyabout 5 years ago

Why is there such a big difference in timing between Skylake and Rome? Something compiler specific? The number of steps required to create a thread should be identical.I’ll also be interested to see the same benchmark but using pthread_create directly.

评论 #22457746 未加载

评论 #22456854 未加载

maayankabout 5 years ago

Why the relative high cost of threads on ARM? If anything, I'd imagine it is more geared towards "massive parallel" scenarios (i.e. dozens of cores).

Koshkinabout 5 years ago

Intel’s excellent TBB library is the answer to all your worries about threads in C++. (IMHO it should be made part of the standard library.)

评论 #22457210 未加载

signa11about 5 years ago

imho, if _cost_ of thread creation is where the bottleneck is, then more likely than not, you are doing things wrong.

评论 #22456814 未加载

brainscdfabout 5 years ago

My personal best practice is to always create a thread pool on program startup and distribute your tasks among the thread pool. I use the same best practice in all other languages too. Is this best practice sound or can it lead to problems in some corner cases?

评论 #22456885 未加载

评论 #22458659 未加载

12 comments

drmeisterabout 5 years ago

评论 #22457159 未加载

评论 #22458852 未加载

评论 #22458415 未加载

评论 #22457133 未加载

评论 #22457535 未加载

评论 #22457032 未加载

评论 #22457994 未加载

boulosabout 5 years ago

评论 #22461130 未加载

bluetomcatabout 5 years ago

评论 #22456906 未加载

评论 #22456880 未加载

评论 #22457352 未加载

评论 #22456993 未加载

shin_laoabout 5 years ago

评论 #22457357 未加载

评论 #22457124 未加载

hrgigerabout 5 years ago

Using taskset pinning my numbers improves:$taskset --cpu-list 8 ./costofthread avg: 11000~$taskset --cpu-list 8,11 ./costofthread avg: 33000~$./costofthread avg: 60000~

saagarjhaabout 5 years ago

Is a std::thread a thin wrapper around pthreads on Linux?

评论 #22457129 未加载

评论 #22456748 未加载

评论 #22457090 未加载

评论 #22456881 未加载

knownabout 5 years ago

评论 #22457207 未加载

评论 #22457054 未加载

评论 #22457770 未加载

isattyabout 5 years ago

评论 #22457746 未加载

评论 #22456854 未加载

maayankabout 5 years ago

Why the relative high cost of threads on ARM? If anything, I'd imagine it is more geared towards "massive parallel" scenarios (i.e. dozens of cores).

Koshkinabout 5 years ago

Intel’s excellent TBB library is the answer to all your worries about threads in C++. (IMHO it should be made part of the standard library.)

评论 #22457210 未加载

signa11about 5 years ago

imho, if _cost_ of thread creation is where the bottleneck is, then more likely than not, you are doing things wrong.

评论 #22456814 未加载

brainscdfabout 5 years ago

评论 #22456885 未加载

评论 #22458659 未加载

Cost of a thread in C++ under Linux

12 comments

Cost of a thread in C++ under Linux

12 comments