It is an interesting result, I worked on the design of efficient computational engines for predictive coding type models almost two decades ago. Intuitively it seemed like this should be the case but it was never demonstrated.<p>Naive implementations were slow even at the time because obvious ways of prediction error minimization for these computational structures is fairly pathological in multiple dimensions. Somewhere in my dusty research archives is a new mechanism for doing close to optimal prediction error minimization at scale at a <i>much</i> higher efficiency such that it became practical. This might motivate me to revive that work; few people were interested at the time.
For those asking about Imagenet results, this method is 100x more computationally intensive than a regular backprop. The code seems legit at first glance (in Pytorch, even though they pretty much implemented everything from scratch).<p>The main result is the local Hebbian-like learning converges to exactly the same gradients as the ones produced with backprop.
For those curious on the topic of Predictive Coding, I'd highly recommend reading two of they key source papers on it:<p>1. James CR Whittington and Rafal Bogacz. An approximation of the error backpropagation algorithmin a predictive coding network with local hebbian synaptic plasticity.Neural computation, 29(5):1229–1262, 2017.<p>2. Rafal Bogacz. A tutorial on the free-energy framework for modelling perception and learning.Journalof mathematical psychology, 76:198–211, 2017.<p>(source: grad student working on PC networks. happy to chat if anyone wants)