What is the difference between Newton's Method and Gradient Descent?<p>Edit: Found an answer: <a href="https://www.quora.com/In-optimization-why-is-Newtons-method-much-faster-than-gradient-descent" rel="nofollow">https://www.quora.com/In-optimization-why-is-Newtons-method-...</a>
Since the author is reading, a few small typos, followed by one slightly more substantial comment: 'simgoid' should be 'sigmoid' (S-shaped); `x y = log(x) + log(y)` should be `log(x y) = log(x) + log(y)`;'guarentee' should be 'guarantee'; 'recipricol' should be 'reciprocal'.<p>I would like to see some mention of the fact that the division by the gradient is a meaningless, purely formal <i>motivation</i> for the correct step (inverting the Hessian) that follows.
Dialing up the complexity a bit from Newton's method, it would be interesting to know whether there are now better explanations of the conjugate gradient method online than this classic (or at least high-profile) intro: <a href="https://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf" rel="nofollow">https://www.cs.cmu.edu/~quake-papers/painless-conjugate-grad...</a>
This is a really nice introduction to logistic regression, well done! My one quibble with the OP is the jump into Newton's method. Maybe a derivation to explain the five steps would help. Thanks!
I search everywhere about the difference between Newton's Method and Gradient Descent? but I couldn't find something that useful. Can u suggest any website/ article where I can learn the difference?
The author has far too little mathematical understanding to be teaching anybody (that’s my impression at least). If you don’t understand Newton’s method before reading this, you won’t understand it afterwards. “A method for finding the roots of a polynomial”. Why polynomials? Does it work, always? Is it fast? Why would following the tangent repeatedly be a good idea? “We take the inverse instead of the reciprocal because it’s a matrix”. Not impressed.