My two cents -- just use Cilk Plus:
<a href="https://www.cilkplus.org/tutorial-cilk-plus-keywords#spawn_and_sync" rel="nofollow">https://www.cilkplus.org/tutorial-cilk-plus-keywords#spawn_a...</a><p>Why I like it:<p>- easy to learn (3 keywords total: cilk_spawn, cilk_sync, cilk_for)<p>- runtime handles thread creation, deciding appropriate number of threads based on hardware<p>- provably efficient work-stealing scheduler<p>- natively supported in GCC 5, branches available for GCC 4.8/4.9 and Clang<p>- comes with a race detector (guaranteed to discover determinacy/data races)<p>- trivial to convert your parallel code to serial (#define spawn/sync keywords -> empty string, and cilk_for -> for)