Trying to understand this.<p>Using latencies from Zen 1 instruction table (see <a href="https://www.agner.org/optimize/instruction_tables.pdf" rel="nofollow">https://www.agner.org/optimize/instruction_tables.pdf</a>):<p><pre><code> mov dword [rsi], eax ; MOV m,r latency is 4
add dword [rsi], 5 ; ADD m,i latency is 6
mov ebx, dword [rsi] ; MOV r,m latency is 4</code></pre>
Total = 14<p>Each instruction depends on the result of the previous, so we need to sum all the latency figures to get the total cycle count. Is this right? How does Agner make it add up to 15?<p>Then for Zen 2:<p><pre><code> mov dword [rsi], eax ; MOV m,r latency is 0 (rather than 4,
; because it is mirrored)
add dword [rsi], 5 ; ADD m,i cannot find an entry for this.
; Looks like there's a typo in the doc.
; I guess the latency is 1.
mov ebx, dword [rsi] ; MOV r,m latency is 0</code></pre>
Total = 1<p>Again, how does Agner make it add up to 2?<p>And for Intel Skylake:<p><pre><code> mov dword [rsi], eax ; MOV m,r latency is 2
add dword [rsi], 5 ; ADD m,i - latency is 5
mov ebx, dword [rsi] ; MOV r,m latency is 2</code></pre>
Total = 9