I find it really interesting that the idiomatic Haskell implementations (basically math) are the best-performing, while the Rust-like Haskell implementation (using an IORef) is orders of magnitude slower. This is exactly what I want: describe the logic of the operation, and leave the compiler to optimize for the hardware (in this case a CPU which mutates registers). The Rust implementation makes assumptions about the underlying hardware (has registers we can mutate), and is only about 15% faster than the Haskell implementation which makes no such assumptions.<p>In essence, this is why I love Haskell, and choose it over Rust: it allows me to write my application logic directly, without having to think about it in terms of mutation, and have the generated code be pretty fast. If GHC becomes well-optimized enough it can render Rust obsolete, since "no runtime overhead" becomes pretty meaningless if it's actually slower than Haskell (e.g. using LinearTypes, which removes need for GC). Rust can't render Haskell obsolete, however, since Haskell's goal is basically allowing you to write logic directly, using types as propositions and values as proofs. So Haskell's goal is a qualitative one (execute logic) while Rust's is a quantitative one (performance/no runtime overhead), which results in Haskell being able to take the place of Rust if GHC gains sufficiently in performance.
> But look again: C is taking 87 nanoseconds, while Rust and Haskell both take about 175 microseconds. It turns out that GCC it able to optimize this into a downward-counting loop, which drastically improves the performance. We can do similar things in Rust and Haskell to get down to nanosecond-level performance, but that's not our goal today. I do have to say: well done GCC.<p>Downward-counting or not, it is simply impossible for GCC to generate code that executes all 1,000,000 iterations of the loop in 87ns. That would be 87 femtoseconds per iteration, on average.<p>More likely, GCC figured out how to collapse the entire loop into a closed-form expression that is a function of the loop length.
I feel like the author has felt obliged to include the full results, which is noble, but it's mostly obscuring the interesting results.<p>What does it matter if the "cheating" versions are faster, since they're doing something completely different? (OK, in principle it could be the same with an unrealistically magical optimizer.)<p>Seems to me the key point is that a bunch of high-level constructs in both Rust and Haskell are very nearly as fast as a tight loop in C. That's great!<p>The versions that are much slower don't seem very surprising, as they involve boxing the ints. <i>(Edit to add: OK, reading more closely, I guess 'Haskell iterator 5' is interesting to dig into.)</i>