These kind of geometrical solutions are easier to grasp than other traditional solutions. If geometry is not your thing but programming is, I really recommend
Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares [0] by Stanford's prof Boyd, he has written about fast methods using linear algebra that are guaranteed to converge if they meet certain criteria [1].<p>[0] <a href="http://vmls-book.stanford.edu/" rel="nofollow">http://vmls-book.stanford.edu/</a>
[1] <a href="https://arxiv.org/abs/1511.06324" rel="nofollow">https://arxiv.org/abs/1511.06324</a>
How does it compare <i>in practice</i> to modern forms of stochastic gradient descent with momentum (e.g., RAdam SGD) when applied to challenging, multi-million-variable non-convex optimization problems?