Depending on the order of the arguments to min max you'll get an extra move instruction [1]:<p>std::min(max, std::max(min, v));<p><pre><code> maxsd xmm0, xmm1
minsd xmm0, xmm2
</code></pre>
std::min(std::max(v, min), max);<p><pre><code> maxsd xmm1, xmm0
minsd xmm2, xmm1
movapd xmm0, xmm2
</code></pre>
For min/max on x86 if any operand is NaN the instruction copies the second operand into the first. So the compiler can't reorder the second case to look like the first (to leave the result in xmm0 for the return value).<p>The reason for this NaN behavior is that minsd is implemented to look like `(a < b) ? a : b`, where if any of a or b is NaN the condition is false, and the expression evaluates to b.<p>Possibly std::clamp has the comparisons ordered like the second case?<p>[1]: <a href="https://godbolt.org/z/coes8Gdhz" rel="nofollow">https://godbolt.org/z/coes8Gdhz</a>
I did a double take on this because I wrote a blog post about this topic a few months ago and came to a very different conclusion, that the results are effectively identical on clang and gcc is just weird.<p>Then I realized that I was writing about compiling for ARM and this post is about x86. Which is extra weird! Why is the compiler better tuned for ARM than x86 in this case?<p>Never did figure out what gcc's problem was.<p><a href="https://godbolt.org/z/Y75qnTGdr" rel="nofollow">https://godbolt.org/z/Y75qnTGdr</a>
On gcc 13, the difference in assembly between the min(max()) version and std::clamp is eliminated when I add the -ffast-math flag. I suspect that the two implementations handle one of the arguments being NaN a bit differently.<p><a href="https://gcc.godbolt.org/z/fGaP6roe9" rel="nofollow">https://gcc.godbolt.org/z/fGaP6roe9</a><p>I see the same behavior on clang 17 as well<p><a href="https://gcc.godbolt.org/z/6jvnoxWhb" rel="nofollow">https://gcc.godbolt.org/z/6jvnoxWhb</a>
Clang generates the shortest of these if you target sandybridge, or x86-64-v3, or later. The real article that's buried in this article is that compilers target k8-generic unless you tell them otherwise, and the features and cost model of opteron are obsolete.<p>Always specify your target.
I'm a heavy std::clamp user, but I'm considering replacing it with min+max because of the uncertainty about what will happen when lo > hi. On windows it triggers an assertion, while other platforms just do a min+max in one or the other order. Of course, this should never happen but can be difficult to guarantee when the limits are derived from user inputs.
Both recent GCC and Clang are able to generate the most optimal version for std::clamp() if you add something like -march=znver1, even at -O1 [0]. Interesting!<p>[0] <a href="https://godbolt.org/z/YsMMo7Kjz" rel="nofollow">https://godbolt.org/z/YsMMo7Kjz</a>
On a somewhat similar note, don't use std::lerp if you don't need its strong guarantees around rounding (monotonicity among other things).<p><a href="https://godbolt.org/z/hzrG3s6T4" rel="nofollow">https://godbolt.org/z/hzrG3s6T4</a>
I see that the assembly instructions are different, but what's the performance difference? Personally, I don't care about the number of instructions used, as long as it's faster. With things like store forwarding and register files, a lot of those movs might be treated as noops.
The only times I worry about min/max/clamp performance is when I need to do thousands or millions of them. And in that case, I’d suggest intrinsics. You get to choose how NaN is handled, it’s branchless, and you can do multiple in parallel.<p>It feels backwards that you need to order your comparisons so as to generate optimal assembly.
<a href="https://bugs.llvm.org/show_bug.cgi?id=47271" rel="nofollow">https://bugs.llvm.org/show_bug.cgi?id=47271</a><p>This specific test (click the godbolt links) does not reproduce the issue.