This is the commit in the Rust repository that achieves the speedup by reducing branch misprediction:<p><a href="https://github.com/rust-lang/rust/commit/b7089e0dd3e988270f34f182d3749ea5fff5a18f">https://github.com/rust-lang/rust/commit/b7089e0dd3e988270f3...</a><p>Do I understand correctly why this works?<p>In the `if` condition on line 201, `child + i < v.len()` is easy to predict but `is_less(...)` is hard. Because of this, `child += 1` on line 202 needs to be walked back with a non-negligible probability. This wastes time.<p><pre><code> 200 // Choose the greater child.
201 if child + 1 < v.len() && is_less(&v[child], &v[child + 1]) {
202 child += 1;
203 }
204
205 // Stop if the invariant holds at `node`.
206 if !is_less(&v[node], &v[child]) {
</code></pre>
After moving the `is_less(...)` call from the `if` condition to the consequent on line 205 below, we still need to fetch both `child += ...` and `is_less(...)` but we are much less likely to need to walk these instructions back. We waste less time and we reach the `if` condition on line 209 sooner.<p><pre><code> 200 // Choose the greater child.
201 if child + 1 < v.len() {
202 // We need a branch to be sure not to out-of-bounds index,
203 // but it's highly predictable. The comparison, however,
204 // is better done branchless, especially for primitives.
205 child += is_less(&v[child], &v[child + 1]) as usize;
206 }
207
208 // Stop if the invariant holds at `node`.
209 if !is_less(&v[node], &v[child]) {</code></pre>