TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Why is 2 * (i * i) faster than 2 * i * i in Java?

424 点作者 trequartista超过 6 年前

16 条评论

userbinator超过 6 年前
<i>So it&#x27;s an issue of the optimizer; as is often the case, it unrolls too aggressively and shoots itself in the foot, all the while missing out on various other opportunities.</i><p>In my experience, loop unrolling should basically never be done except in extremely degenerate cases; I remember not long ago someone I know who also optimises Asm remarking &quot;it should&#x27;ve died along with the RISC fad&quot;. The original goal was to reduce per-iteration overhead associated with checking for end-of-loop, but any superscalar&#x2F;OoO&#x2F;speculative processor can &quot;execute past&quot; those instructions anyway; all that unrolling will do is bloat the code and work against caching. Memory bandwidth is often the bottleneck, not the core.
评论 #18575282 未加载
评论 #18575832 未加载
评论 #18576505 未加载
评论 #18575759 未加载
评论 #18575516 未加载
评论 #18576984 未加载
jepler超过 6 年前
You should translate your program to C++ and build with clang ; it turns the loop into a single constant load. <a href="https:&#x2F;&#x2F;godbolt.org&#x2F;z&#x2F;slznbU" rel="nofollow">https:&#x2F;&#x2F;godbolt.org&#x2F;z&#x2F;slznbU</a>
评论 #18575390 未加载
评论 #18577233 未加载
评论 #18577363 未加载
评论 #18575132 未加载
beeforpork超过 6 年前
With all the optimisations being implemented in compilers today, it is impressive to see how this opportunity to optimise is missed. Put differently, compiler writers bother about optimisations that gain 0.1% performance in some special cases, but others that could gain 20% performance are not implemented.<p>Why? Is this optimisation particularly difficult to implement? Or is it just missed low-hanging fruit? It sure looks easy (like: rearrange expressions to keep the expression tree shallow and left-branching to avoid stack operations).
评论 #18577408 未加载
评论 #18578538 未加载
评论 #18577333 未加载
评论 #18577083 未加载
techopoly超过 6 年前
That just might be the most dedicated answer I&#x27;ve ever seen on Stack Overflow.
评论 #18575091 未加载
评论 #18576225 未加载
评论 #18576214 未加载
dreamcompiler超过 6 年前
I thought at first this was because integer squaring is potentially faster than general integer multiplication and the compiler wasn&#x27;t seeing the square operation in the second case, but that&#x27;s not the explanation here.
评论 #18575575 未加载
ww520超过 6 年前
I&#x27;m surprised it&#x27;s not doing a left shift for the x2.
评论 #18574985 未加载
评论 #18574991 未加载
podsnap超过 6 年前
The graal behavior is a lot more sane:<p><pre><code> graal: [info] SoFlow.square_i_two 10000 avgt 10 5338.492 ± 36.624 ns&#x2F;op &#x2F;&#x2F; 2 *\sum i * i [info] SoFlow.two_i_ 10000 avgt 10 6421.343 ± 34.836 ns&#x2F;op &#x2F;&#x2F; \sum 2 * i * i [info] SoFlow.two_square_i 10000 avgt 10 6367.139 ± 34.575 ns&#x2F;op &#x2F;&#x2F; \sum 2 * (i * i) regular 1.8: [info] SoFlow.square_i_two 10000 avgt 10 6393.422 ± 27.679 ns&#x2F;op [info] SoFlow.two_i_ 10000 avgt 10 8870.908 ± 35.715 ns&#x2F;op [info] SoFlow.two_square_i 10000 avgt 10 6221.205 ± 42.408 ns&#x2F;op </code></pre> The graal-generated assembly for the first two cases is nearly identical, featuring unrolled repetitions of sequences like<p><pre><code> [info] 0x000000011433ec03: mov %r8d,%ecx [info] 0x000000011433ec06: shl %ecx ;*imul {reexecute=0 rethrow=0 return_oop=0} [info] ; - add.SoFlow::test_two_i_@15 (line 41) [info] 0x000000011433ec08: imul %r8d,%ecx ;*imul {reexecute=0 rethrow=0 return_oop=0} [info] ; - add.SoFlow::test_two_i_@17 (line 41) [info] 0x000000011433ec0c: add %ecx,%r9d ;*iadd {reexecute=0 rethrow=0 return_oop=0} [info] ; - add.SoFlow::test_two_i_@18 (line 41) [info] 0x000000011433ec0f: lea 0x5(%r11),%r8d ;*iinc {reexecute=0 rethrow=0 return_oop=0} [info] ; - add.SoFlow::test_two_i_@20 (line 40) </code></pre> while the third case does a single shl at the end.<p><pre><code> [info] 0x000000010e2918bb: imul %r8d,%r8d ;*imul {reexecute=0 rethrow=0 return_oop=0} [info] ; - add.SoFlow::test_square_i_two@15 (line 32) [info] 0x000000010e2918bf: add %r8d,%ecx ;*iadd {reexecute=0 rethrow=0 return_oop=0} [info] ; - add.SoFlow::test_square_i_two@16 (line 32) [info] 0x000000010e2918c2: lea 0x3(%r11),%r8d ;*iinc {reexecute=0 rethrow=0 return_oop=0} [info] ; - add.SoFlow::test_square_i_two@18 (line 31) </code></pre> Both graal and C2 inline, but as usual the graal output is a lot more comprehensible.
bnegreve超过 6 年前
I don&#x27;t see how generating different code for the same mathematical expression can be a good thing.<p>The compiler should detect that the two expressions are strictly equivalent and generate whatever code it believes is the fastest.<p>Any idea why it is this way?
评论 #18576431 未加载
评论 #18576659 未加载
评论 #18576295 未加载
crb002超过 6 年前
TIL about printing ASM from debug JVMs.
评论 #18575818 未加载
alkonaut超过 6 年前
Is Overflow UB so the compiler can choose to ignore the fact that 2x(i x i) could overflow differently from 2 x i x i?<p>I’m not sure it does overflow differently but I would expect overflow to behave consistently as written, and not be dependent on optimization, is that not the case?
评论 #18578844 未加载
Koshkin超过 6 年前
At first, I thought it was because i * i == -1.
microcolonel超过 6 年前
I guess they do not use value numbering, which is typically how you get equivalent results for cases like this.
qwerty456127超过 6 年前
IMHO some kind of logic preprocesor should take care of this before the actual compilation.
评论 #18578034 未加载
polskibus超过 6 年前
I wonder if the same applies to .net (fx&#x2F;core).
评论 #18578067 未加载
networkimprov超过 6 年前
Has anyone tried this with Go?
评论 #18576283 未加载
评论 #18576842 未加载
JohnL4超过 6 年前
The database is fast enough for a few extra trips to it, so this is definitely what we should be focusing on.<p>(My cup of bitterness doth overflow.)