TechEcho

10 comments

The main point of this is that natural gradient descent is a second-order method. The main GD update equation is:∇̃L(θ) = F⁻¹∇L(θ)which requires solving a linear system. For this, you can use the methods from the author's previous paper [Thermodynamic Linear Algebra](<a href="https://arxiv.org/abs/2308.05660" rel="nofollow">https://arxiv.org/abs/2308.05660</a>).Since it's hard to implement a full neural network on a thermodynamic computer, the paper suggests running one in parallel to a normal GPU. The GPU computes F and ∇L(θ), but offloads the linear system to the thermo computer, which runs in parallel to the digital system (Figure 1).It is important to note that the "Runtime vs Accuracy" plot in Figure 3 uses a "timing model" for the TNGD algorithm, since the computer necessary to run the algorithm still doesn't exist.

评论 #40471063 未加载

cs70212 months ago

Cool and interesting. The authors propose a hybrid digital-analog training loop that takes into account the curvature of the loss landscape (i.e., it uses second-order derivatives), and show with numerical simulations that if their method is implemented in a hybrid digital-analog physical system, each iteration in the training loop would incur computational cost that is linear in the number of parameters. I'm all for figuring out ways to let the Laws of Thermodynamics do the work of training AI models, if doing so enables us to overcome the scaling limitations and challenges of existing digital hardware and training methods.

stefanpie12 months ago

I know they mainly present results on deep learning/neural network training and optimization, but I wonder how easy it would be to use the same optimization framework for other classes of hard or large optimization problems. I was also curious about this when I saw posts about Extropic (<a href="https://www.extropic.ai/" rel="nofollow">https://www.extropic.ai/</a>) stuff for the first time.I tried looking into any public info on their website about APIs or software stack to see what's possible beyond NN stuff to model other optimization problems. It looks like that's not shared publicly yet.There are certainly many NP-hard and large combinatorial or analytical optimization problems still out there that are worth being able to tackle with new technology. Personally, I care about problems in EDA and semiconductor design. Adiabatic quantum computing was one technology with the promise of solving optimization problems (and quantum computing is still playing out with only small-scale solutions at the moment). Hoping that these new "thermodynamic computing" startups also might provide some cool technology to explore these problems with.

评论 #40468824 未加载

rsp198412 months ago

Leveraging thermodynamics to more efficiently compute second-order updates is certainly cool and worth exploring, however specifically in the context of deep learning I remain skeptical of its usefulness.We already have very efficient second-order methods running on classical hardware [1] but they are basically not being used at all in practice, as they are outperformed by ADAM and other 1st-order methods. This is because optimizing highly nonlinear loss functions, such as the ones in deep learning models, only really works with very low learning rates, regardless of whether a 1st or a 2nd order method is used. So, comparatively speaking, a 2nd order method might give you a slightly better parameter update per step but at a more-than-slightly-higher cost, so most of the time it's simply not worth doing.[1] <a href="https://andrew.gibiansky.com/blog/machine-learning/hessian-free-optimization/" rel="nofollow">https://andrew.gibiansky.com/blog/machine-learning/hessian-f...</a>

评论 #40470276 未加载

esafak12 months ago

Not having read the paper carefully, could someone tell me what the draw is? It looks like it is going to have the same asymptotic complexity as SGD in terms of sample size, per Table 1. Given that today's large, over-specified models have numerous, comparable extrema, is there even a need for this? I wouldn't get out of bed unless it were sublinear.

gnarbarian12 months ago

this reminds me of simulated annealing which I learned about in an AI class about a decade ago.<a href="https://en.wikipedia.org/wiki/Simulated_annealing" rel="nofollow">https://en.wikipedia.org/wiki/Simulated_annealing</a>

killerstorm12 months ago

What's our current best guess of how animal neurons learn?

评论 #40477179 未加载

评论 #40476827 未加载

danbmil9912 months ago

Wasn't Geoffrey Hinton going on about this about a year ago?

mirekrusin12 months ago

I don't get it, gradient descend computation is super frequent, state/input changes all the time, you'd have to reset heat landscape very frequently, what's the point? No way there is any potential speedup opportunity there, no?If anything you could probably do something with electromagnetic fields, their interference, possibly in 3d.

G3rn0ti12 months ago

Sounds great until> requires an analog thermodynamic computerWait. What?Perhaps a trained physicist can comment on that. Thanks.

评论 #40468444 未加载

评论 #40468882 未加载

评论 #40468455 未加载

评论 #40468440 未加载

评论 #40468721 未加载

评论 #40469246 未加载

10 comments

thomasahle12 months ago

评论 #40471063 未加载

cs70212 months ago

stefanpie12 months ago

评论 #40468824 未加载

rsp198412 months ago

评论 #40470276 未加载

esafak12 months ago

gnarbarian12 months ago

killerstorm12 months ago

What's our current best guess of how animal neurons learn?

评论 #40477179 未加载

评论 #40476827 未加载

danbmil9912 months ago

Wasn't Geoffrey Hinton going on about this about a year ago?

mirekrusin12 months ago

G3rn0ti12 months ago

Sounds great until> requires an analog thermodynamic computerWait. What?Perhaps a trained physicist can comment on that. Thanks.

Thermodynamic Natural Gradient Descent

10 comments

Thermodynamic Natural Gradient Descent

10 comments