This series is well written. The speed up is expected due because you're incorporating second-order information during the optimisation. For more insight into second order optimisation methods, take a look at Newton's method: <a href="https://en.wikipedia.org/wiki/Newton%27s_method" rel="nofollow">https://en.wikipedia.org/wiki/Newton%27s_method</a>. The intuition, derivation, and proof of correctness and convergence speed are quite illuminating.