TechEcho

11 comments

cmovqover 1 year ago

Depending on the order of the arguments to min max you'll get an extra move instruction [1]:std::min(max, std::max(min, v));<pre><code> maxsd xmm0, xmm1 minsd xmm0, xmm2 </code></pre> std::min(std::max(v, min), max);<pre><code> maxsd xmm1, xmm0 minsd xmm2, xmm1 movapd xmm0, xmm2 </code></pre> For min/max on x86 if any operand is NaN the instruction copies the second operand into the first. So the compiler can't reorder the second case to look like the first (to leave the result in xmm0 for the return value).The reason for this NaN behavior is that minsd is implemented to look like `(a < b) ? a : b`, where if any of a or b is NaN the condition is false, and the expression evaluates to b.Possibly std::clamp has the comparisons ordered like the second case?[1]: <a href="https://godbolt.org/z/coes8Gdhz" rel="nofollow">https://godbolt.org/z/coes8Gdhz</a>

评论 #39015529 未加载

评论 #39018537 未加载

评论 #39025201 未加载

评论 #39015661 未加载

camblomquistover 1 year ago

I did a double take on this because I wrote a blog post about this topic a few months ago and came to a very different conclusion, that the results are effectively identical on clang and gcc is just weird.Then I realized that I was writing about compiling for ARM and this post is about x86. Which is extra weird! Why is the compiler better tuned for ARM than x86 in this case?Never did figure out what gcc's problem was.<a href="https://godbolt.org/z/Y75qnTGdr" rel="nofollow">https://godbolt.org/z/Y75qnTGdr</a>

评论 #39014131 未加载

celegans25over 1 year ago

On gcc 13, the difference in assembly between the min(max()) version and std::clamp is eliminated when I add the -ffast-math flag. I suspect that the two implementations handle one of the arguments being NaN a bit differently.<a href="https://gcc.godbolt.org/z/fGaP6roe9" rel="nofollow">https://gcc.godbolt.org/z/fGaP6roe9</a>I see the same behavior on clang 17 as well<a href="https://gcc.godbolt.org/z/6jvnoxWhb" rel="nofollow">https://gcc.godbolt.org/z/6jvnoxWhb</a>

评论 #39013277 未加载

评论 #39014940 未加载

jeffbeeover 1 year ago

Clang generates the shortest of these if you target sandybridge, or x86-64-v3, or later. The real article that's buried in this article is that compilers target k8-generic unless you tell them otherwise, and the features and cost model of opteron are obsolete.Always specify your target.

评论 #39013490 未加载

评论 #39039616 未加载

svantanaover 1 year ago

I'm a heavy std::clamp user, but I'm considering replacing it with min+max because of the uncertainty about what will happen when lo > hi. On windows it triggers an assertion, while other platforms just do a min+max in one or the other order. Of course, this should never happen but can be difficult to guarantee when the limits are derived from user inputs.

评论 #39013598 未加载

评论 #39013187 未加载

评论 #39013278 未加载

评论 #39015501 未加载

tambreover 1 year ago

Both recent GCC and Clang are able to generate the most optimal version for std::clamp() if you add something like -march=znver1, even at -O1 [0]. Interesting![0] <a href="https://godbolt.org/z/YsMMo7Kjz" rel="nofollow">https://godbolt.org/z/YsMMo7Kjz</a>

评论 #39013148 未加载

评论 #39039609 未加载

planedeover 1 year ago

On a somewhat similar note, don't use std::lerp if you don't need its strong guarantees around rounding (monotonicity among other things).<a href="https://godbolt.org/z/hzrG3s6T4" rel="nofollow">https://godbolt.org/z/hzrG3s6T4</a>

CountHackulusover 1 year ago

I see that the assembly instructions are different, but what's the performance difference? Personally, I don't care about the number of instructions used, as long as it's faster. With things like store forwarding and register files, a lot of those movs might be treated as noops.

superjanover 1 year ago

The only times I worry about min/max/clamp performance is when I need to do thousands or millions of them. And in that case, I’d suggest intrinsics. You get to choose how NaN is handled, it’s branchless, and you can do multiple in parallel.It feels backwards that you need to order your comparisons so as to generate optimal assembly.

nickysielickiover 1 year ago

<a href="https://bugs.llvm.org/show_bug.cgi?id=47271" rel="nofollow">https://bugs.llvm.org/show_bug.cgi?id=47271</a>This specific test (click the godbolt links) does not reproduce the issue.

fookerover 1 year ago

If you benchmark these, you'll likely find the version with the jump edges out the one with the conditional instruction in practice.

评论 #39013372 未加载

评论 #39013309 未加载

评论 #39013181 未加载

11 comments

cmovqover 1 year ago

评论 #39015529 未加载

评论 #39018537 未加载

评论 #39025201 未加载

评论 #39015661 未加载

camblomquistover 1 year ago

评论 #39014131 未加载

celegans25over 1 year ago

评论 #39013277 未加载

评论 #39014940 未加载

jeffbeeover 1 year ago

评论 #39013490 未加载

评论 #39039616 未加载

svantanaover 1 year ago

评论 #39013598 未加载

评论 #39013187 未加载

评论 #39013278 未加载

评论 #39015501 未加载

tambreover 1 year ago

评论 #39013148 未加载

评论 #39039609 未加载

planedeover 1 year ago

CountHackulusover 1 year ago

superjanover 1 year ago

nickysielickiover 1 year ago

<a href="https://bugs.llvm.org/show_bug.cgi?id=47271" rel="nofollow">https://bugs.llvm.org/show_bug.cgi?id=47271</a>This specific test (click the godbolt links) does not reproduce the issue.

fookerover 1 year ago

If you benchmark these, you'll likely find the version with the jump edges out the one with the conditional instruction in practice.

评论 #39013372 未加载

评论 #39013309 未加载

评论 #39013181 未加载

Std: Clamp generates less efficient assembly than std:min(max,std:max(min,v))

11 comments

Std: Clamp generates less efficient assembly than std:min(max,std:max(min,v))

11 comments