TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Loading an x64 register, how hard could it be?

51 点作者 mpu超过 9 年前

6 条评论

tbirdz超过 9 年前
Is this really that crazy? A 64-bit immediate value takes 8 bytes. So of that 10 byte instruction, 8 bytes of it are the value to load into the register. Similarly, a 32-bit immediate value takes 4 bytes, so 4 bytes of the other instructions are the 32-bit immediate values. Taking this into account, we see that the non-immediate overhead is 1 byte for movl, 2 bytes for movq, and 2 bytes for movabsq.<p>I don&#x27;t really think this is as crazy as the article is implying.
评论 #10246464 未加载
评论 #10246602 未加载
评论 #10247347 未加载
gsg超过 9 年前
Populating x86-64 floating point registers is also an amusing subject.<p>The obvious instruction for loading a (64-bit) float into an xmm register is movsd. With a memory source operand, the higher part of the register is zeroed, which is what you want. No problem.<p>Now the fun part: if the source is not memory but another xmm register, the higher part of the register is <i>not</i> zeroed. This induces a false dependency on the previous value of the destination register that can cause performance issues. To avoid this problem, such register-register copies should be done with a packed move instruction. (Or vmovsd, but that was added much later.)<p>The obvious packed move instruction for 64-bit floats is movapd, but we can do better than that by using movaps - it is still a float domain instruction but is a byte smaller.<p>So the optimal way to move a single double from one register to another is to use a vector move of the wrong type.
waterhouse超过 9 年前
&gt; &quot;it is impossible to add 2^33 to rax using one instruction only.&quot;<p>This can in fact be done, with a memory operand. I&#x27;m not sure about the performance compared to a 64-bit load immediate followed by an add, but this will do it (NASM syntax):<p><pre><code> add rax, [rel the_constant] ... the_constant: dq 8589934592</code></pre>
评论 #10246954 未加载
justin_超过 9 年前
For those wondering what he meant by the &quot;data dependency&quot; issue that zeroing out the upper 32 bits avoids, the first answer to this SO question does a decent job of explaining it: <a href="https:&#x2F;&#x2F;stackoverflow.com&#x2F;questions&#x2F;11177137&#x2F;why-do-most-x64-instructions-zero-the-upper-part-of-a-32-bit-register" rel="nofollow">https:&#x2F;&#x2F;stackoverflow.com&#x2F;questions&#x2F;11177137&#x2F;why-do-most-x64...</a>
WalterBright超过 9 年前
It&#x27;s actually 7 bytes, not 6, to load a sign extended value into a register.<p><pre><code> 48 C7 C0 FF FF FF FF mov RAX,0FFFFFFFFh</code></pre>
评论 #10248890 未加载
transfire超过 9 年前
Makes one miss the 6502.
评论 #10247977 未加载