I thought it might be interesting to see how this effect changes with the size of the array being summed. How do the relative speeds change when operating out of L1, L3, and memory? Does the lower speed of memory access overwhelm the overhead of the overflow checking?<p><pre><code> $ swift build --configuration release
$ cset proc -s nohz -e .build/release/reduce
# count (basic, reduce, unsafe basic, unsafe reduce)
1000 (0.546, 0.661, 0.197, 0.576)
10000 (0.403, 0.598, 0.169, 0.544)
100000 (0.391, 0.595, 0.194, 0.542)
1000000 (0.477, 0.663, 0.294, 0.582)
10000000 (0.507, 0.655, 0.337, 0.608)
100000000 (0.509, 0.655, 0.339, 0.608)
1000000000(0.511, 0.656, 0.345, 0.611)
$ swift build --configuration release -Xswiftc -Ounchecked
$ cset proc -s nohz -e .build/release/reduce
# count (basic, reduce, unsafe basic, unsafe reduce)
1000 (0.309, 0.253, 0.180, 0.226)
10000 (0.195, 0.170, 0.168, 0.170)
100000 (0.217, 0.203, 0.196, 0.201)
1000000 (0.292, 0.326, 0.299, 0.252)
10000000 (0.334, 0.337, 0.333, 0.337)
100000000 (0.339, 0.339, 0.340, 0.339)
1000000000(0.344, 0.344, 0.344, 0.344)
</code></pre>
Code is from <a href="https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/tree/master/2016/12/05" rel="nofollow">https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/...</a> with modification to loop over the different array lengths. Numbers are for Skylake at 3.4 GHz with swift-3.0.1-RELEASE-ubuntu16.04. Count is the number of 8B ints in the array being summed. Results shown were truncated by hand --- I wasn't sure how to specify precision from within Swift. The execution with "cset proc -s nohz" was to reduce jitter between runs, but doesn't significantly affect total run time. The anomalously fast result for the L3 sized 'unsafe' 'unchecked' is consistent.