Let me see if I can describe the laser part of the paper correctly. They made a laser pulse consisting of a bunch of different frequencies mixed together. The intensity of each frequency represents a controllable parameter of the system. The pulse was sent through a crystal that performs a complex transformation that mixes all the frequencies together in a nonlinear and noisy way. Then they measure the frequency spectrum of the output. By itself, this system performs computations of a sort, but they are not useful.<p>To make the computations useful, first they trained a conventional digital neural network to predict the outputs given the input controllable parameters. Then they arbitrarily assigned some of the controllable parameters to be the inputs of the neural network and others were arbitrarily assigned to be the trainable weights. Then they used the crystal to run forward passes on the training data. After each forward pass, they used the trained regular neural network to do the reverse pass and estimate the gradients of the outputs with respect to the weights. With the gradients they update the weights just like a regular neural net.<p>Although the gradients computed by the neural nets are not a perfect match to the real gradients of the physical system (which are unknown), they don't need to be perfect. Any drift is corrected because the forward pass is always run by the real physical system, and stochastic gradient descent is naturally pretty tolerant of noise and bias.<p>Since they're just using neural nets to estimate the behavior of the physical system rather than modeling it with physics, they can use literally any physical system and the behavior of the system does not have to be known. The only requirement of the system is that it does a complex nonlinear transformation on a bunch of controllable parameters to produce a bunch of outputs. They also demonstrate using vibrations of a metal plate.<p>Seems like this method may not lead to huge training speedups since regular neural nets are still involved. But after training, the physical system is all you need to run inference, and that part can be super efficient.
This uses a physical system with controllable parameters to compute a forward pass and<p>> using a differentiable digital model, the gradient of the loss is estimated with respect to the controllable parameters.<p>So e.g. they have a tunable laser that shifts the spectrum of an encoded input based on a set of parameters, and then they update the parameters based on a gradient computed from a digital simulation of the laser (physics aware model).<p>When I read the headline I imagined they had implemented back propagation in a physical system
If you can train a non-linear physical system with this method, in principle, you could also train real brains. You can't update the parameters of the brain, but you can inject signal. Assuming real brains to be black box functions for which you could learn a noisy estimator of gradients, it could be used for neural implants that supplement lost brain functionality, or a Matrix-like skill loading system.
> Deep-learning models have become pervasive tools in science and engineering. However, their energy requirements now increasingly limit their scalability.[1]<p>They make this claim first, and cite one source. I haven't heard of this as an issue before. Is there anywhere else I could read more on this?<p>[1]<a href="https://arxiv.org/abs/2104.10350" rel="nofollow">https://arxiv.org/abs/2104.10350</a>
If they can scale it up to GPT-3 like sizes, it would be amazing. Foundation models like GPT-3 will be the operating system of tomorrow. But now they are too expensive to run.<p>They can be trained once and then frozen and you can develop new skills by learning control codes (prompts), or adding a retrieval subsystem (search engine in the loop).<p>If you shrink this foundation model to a single chip, something small and energy efficient, then you could have all sorts of smart AI on edge devices.
Physical/analog computers always suffer from noise limiting their usefulness. So I think it would be natural to apply this to a network architecture that includes noise as an integral Part such as GANs or VAEs.