TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

GCC vs. Clang/LLVM: An In-Depth Comparison of C/C++ Compilers

28 pointsby flipchartabout 5 years ago

1 comment

ncmncmabout 5 years ago
One key metric where Gcc and Clang differ is in their willingness to produce &#x27;cmov&#x27;, &quot;conditional move&quot;, instructions. (&#x27;cmov&#x27; is an instruction that implements &#x27;(a=c?b:a)&#x27; without a conditional branch.)<p>Partition is an essential component of numerous key algorithms; almost half of C++ STL algorithms depend on it. A key primitive to implement partition is the &#x27;swap_if&#x27; operation. Performance of this operation thus directly determines the performance of many important algorithms. For best performance with word-sized arguments, &#x27;swap_if&#x27; must be implemented with a pair of back-to-back &#x27;cmov&#x27; instructions.<p>Gcc got its fingers burned a few years ago from emitting &#x27;cmov&#x27; in loop contexts where the condition was predictable and the result immediately depended upon in the next iteration, creating dependency chains that slowed execution of some benchmarks by a factor of 2. Now, Gcc will never, under any circumstances, emit two &#x27;cmov&#x27; instructions in one basic block.<p>Clang is happy to emit a pair of back-to-back &#x27;cmov&#x27; instructions, which enables it to perform partition operations very substantially faster than Gcc. However, it fails to generate &#x27;cmov&#x27; instructions in important cases.<p>The fully general expression of &#x27;swap_if&#x27; looks like:<p><pre><code> template &lt;typename T&gt; bool swap_if( bool c, T&amp; a, T&amp; b) { T v[2] = { a, b }; b = v[1-c], a = v[c]; return c; } </code></pre> It may be used in the inner loop of quicksort&#x27;s partition operations as<p><pre><code> right += swap_if( *left &lt; pivot, *left, *right); </code></pre> A &#x27;swap_if&#x27; implemented with &#x27;cmov&#x27; makes quicksort fully twice as fast as, e.g., current &#x27;std::sort&#x27;. However, neither Gcc nor Clang recognizes this &#x27;swap_if&#x27; as a place to substitute &#x27;cmov&#x27; instructions for the pointer&#x2F;offset accesses. (The latter are slower, probably because they produce more L1 bus traffic.)<p>Clang will happily emit &#x27;cmov&#x27; instructions for a specialization like<p><pre><code> template &lt;&gt; bool swap_if( bool c, int&amp; a, int&amp; b) { int v[2] = { a, b }; a = -c&amp;v[1] | 1-c&amp;v[0]; b = 1-c&amp;v[0] | -c&amp;v[1]; return c; } </code></pre> where Gcc emits exactly the &#x27;and&#x27; and &#x27;or&#x27; instructions. (Interestingly, Gcc&#x27;s code for both definitions, despite failing to use &#x27;cmov&#x27;, runs quicksort quite a lot faster than std::sort; just not 2x as fast.) Clang will also emit &#x27;cmov&#x27; instructions for<p><pre><code> bool swap_if( bool c, int&amp; a, int&amp; b) { int ta = a, tb = b; a = c ? tb : ta; b = c ? ta : tb; return c; } </code></pre> but Gcc will only produce very slow (i.e. badly predicted) branches for this case.<p>If we had &#x27;std::swap_if&#x27; in the C++ Standard Library, probably all implementations would produce good code for it.<p>(As an aside, I find it odd that changing the second line of the template &#x27;swap_if&#x27; above to<p><pre><code> a = v[c], b = v[!c]; </code></pre> makes it much, much slower when compiled with Gcc for recent Intel. I would welcome any insight into why this is.)
评论 #23287331 未加载