Not sure where else to post<p>I work on some CPU intensive software written in C++ and multithreaded with with OpenMP. This software has not been scaling how I would expect on my M1 Max MacBook. Notably, when I use 10 threads I see in activity monitor that the process has the expected number of threads but only seems to be getting scheduled on the the two efficiency cores for long runs. This is not ideal.<p>I’m not sure where to begin diagnosing this issue. Some relevant information:<p>* I’ve been using the llvm-13 toolchain packaged by home-brew since the toolchain included with macOS does not support openMP<p>* Simple OpenMP for loops behave how I would expect and seem to get scheduled to all cores. The issue seems to be for long-running processes.<p>* For the project in question, the runtime bottle neck of the code is the OpenMP accelerated solvers from AMGCL, and this is where I get hit with the only-getting-scheduled-on-efficiency cores issue.<p><pre><code> \* https://github.com/ddemidov/amgcl
\* There doesn’t seem to be any unusual OpenMP primitives in AMGCL
</code></pre>
* Using OMP_NUM_THREADS to use less than 10 threads (like 2 - 6), the process sometimes gets scheduled on performance cores, but this does not seem to be a guarantee.<p>What are some tools / leads for diagnosing why my the simulation process tends to get scheduled onto efficiency cores? What are some options for changing how my process get scheduled?
<p><pre><code> Accurately assigning QoS classes to tasks ensures that your app is both responsive and energy efficient on all Macs. On Apple silicon, a task’s QoS class influences whether the system runs that task on a performance core (P core) or efficiency core (E core). For example, the system is more likely to run Background tasks on E cores to maximize battery life. If you don’t assign QoS classes, your app’s responsiveness and efficiency may suffer as a result.
If you manually configure your thread’s priority using pthread_setschedparam, setpriority, or thread_set_policy, transition to APIs that set QoS classes instead. For example, use the pthread_set_qos_class_self_np function to set the QoS class of your POSIX threads.
</code></pre>
<a href="https://developer.apple.com/documentation/apple-silicon/tuning-your-code-s-performance-for-apple-silicon?preferredLanguage=occ" rel="nofollow">https://developer.apple.com/documentation/apple-silicon/tuni...</a>