> I don’t have numbers to prove this, but the Erlang implementation can also be made faster with a lot less effort since each recursive call can be run in parallel.<p>Ugh... untrue. Process creation is cheap in Erlang, but it's not that cheap. Naively parallelizing each recursive call loses big time. (And, yes, I've benchmarked this.)<p>You can get some gain by parallelizing the first couple recursive calls (up to the number of cores you have available) and then running the rest serially, but that uglifies the code quite a bit. And you're still not going to be as fast as the in-place C version.