The output provided when looking at the given example squaring function without and then with the compiler option -O (GCC 5.1.0 or Clang 3.6) is rather interesting.<p>The code goes from something rather crazy looking, to something that a human would write.
Its surprising that these compilers arnt clever enough already to produce the optimized output by default.
> Compiler: x86 gcc 4.9.2<p>> Compiler options: -march=i686<p>> error: CPU you selected does not support x86-64 instruction set<p>Strange, I'm sure x86 is not x86_64