科技回声

6 条评论

tbirdz超过 9 年前

Is this really that crazy? A 64-bit immediate value takes 8 bytes. So of that 10 byte instruction, 8 bytes of it are the value to load into the register. Similarly, a 32-bit immediate value takes 4 bytes, so 4 bytes of the other instructions are the 32-bit immediate values. Taking this into account, we see that the non-immediate overhead is 1 byte for movl, 2 bytes for movq, and 2 bytes for movabsq.I don't really think this is as crazy as the article is implying.

评论 #10246464 未加载

评论 #10246602 未加载

评论 #10247347 未加载

gsg超过 9 年前

Populating x86-64 floating point registers is also an amusing subject.The obvious instruction for loading a (64-bit) float into an xmm register is movsd. With a memory source operand, the higher part of the register is zeroed, which is what you want. No problem.Now the fun part: if the source is not memory but another xmm register, the higher part of the register is not zeroed. This induces a false dependency on the previous value of the destination register that can cause performance issues. To avoid this problem, such register-register copies should be done with a packed move instruction. (Or vmovsd, but that was added much later.)The obvious packed move instruction for 64-bit floats is movapd, but we can do better than that by using movaps - it is still a float domain instruction but is a byte smaller.So the optimal way to move a single double from one register to another is to use a vector move of the wrong type.

waterhouse超过 9 年前

> "it is impossible to add 2^33 to rax using one instruction only."This can in fact be done, with a memory operand. I'm not sure about the performance compared to a 64-bit load immediate followed by an add, but this will do it (NASM syntax):<pre><code> add rax, [rel the_constant] ... the_constant: dq 8589934592</code></pre>

评论 #10246954 未加载

justin_超过 9 年前

For those wondering what he meant by the "data dependency" issue that zeroing out the upper 32 bits avoids, the first answer to this SO question does a decent job of explaining it: <a href="https://stackoverflow.com/questions/11177137/why-do-most-x64-instructions-zero-the-upper-part-of-a-32-bit-register" rel="nofollow">https://stackoverflow.com/questions/11177137/why-do-most-x64...</a>

WalterBright超过 9 年前

It's actually 7 bytes, not 6, to load a sign extended value into a register.<pre><code> 48 C7 C0 FF FF FF FF mov RAX,0FFFFFFFFh</code></pre>

评论 #10248890 未加载

transfire超过 9 年前

Makes one miss the 6502.

评论 #10247977 未加载

6 条评论

tbirdz超过 9 年前

评论 #10246464 未加载

评论 #10246602 未加载

评论 #10247347 未加载

gsg超过 9 年前

waterhouse超过 9 年前

评论 #10246954 未加载

justin_超过 9 年前

WalterBright超过 9 年前

It's actually 7 bytes, not 6, to load a sign extended value into a register.<pre><code> 48 C7 C0 FF FF FF FF mov RAX,0FFFFFFFFh</code></pre>

评论 #10248890 未加载

transfire超过 9 年前

Makes one miss the 6502.

评论 #10247977 未加载

Loading an x64 register, how hard could it be?

6 条评论

Loading an x64 register, how hard could it be?

6 条评论