Leveraging thermodynamics to more efficiently compute second-order updates is certainly cool and worth exploring, however specifically in the context of deep learning I remain skeptical of its usefulness.<p>We already have very efficient second-order methods running on classical hardware [1] but they are basically not being used at all in practice, as they are outperformed by ADAM and other 1st-order methods. This is because optimizing highly nonlinear loss functions, such as the ones in deep learning models, only really works with very low learning rates, regardless of whether a 1st or a 2nd order method is used. So, comparatively speaking, a 2nd order method might give you a slightly better parameter update per step but at a more-than-slightly-higher cost, so most of the time it's simply not worth doing.<p>[1] <a href="https://andrew.gibiansky.com/blog/machine-learning/hessian-free-optimization/" rel="nofollow">https://andrew.gibiansky.com/blog/machine-learning/hessian-f...</a>