Don't Mess with Backprop: Doubts about Biologically Plausible Deep Learning

91 pointsby ericjangover 4 years ago

13 comments

The author of this piece calls Dynamic Programming "one of the top three achievements of Computer Science", however, it doesn't have much to do with computer science, as it's just a synonym for mathematical optimization, used seemingly exclusively for being "politically-correct" (avoiding the wrath and suspicion of managers) at RAND Corporation:> I spent the Fall quarter (of 1950) at RAND. My first task was to find a name for multistage decision processes. An interesting question is, "Where did the name, dynamic programming, come from?" The 1950s were not good years for mathematical research. We had a very interesting gentleman in Washington named Wilson. He was Secretary of Defense, and he actually had a pathological fear and hatred of the word "research". I’m not using the term lightly; I’m using it precisely. His face would suffuse, he would turn red, and he would get violent if people used the term research in his presence. You can imagine how he felt, then, about the term mathematical. The RAND Corporation was employed by the Air Force, and the Air Force had Wilson as its boss, essentially. Hence, I felt I had to do something to shield Wilson and the Air Force from the fact that I was really doing mathematics inside the RAND Corporation. What title, what name, could I choose? In the first place I was interested in planning, in decision making, in thinking. But planning, is not a good word for various reasons. I decided therefore to use the word "programming". I wanted to get across the idea that this was dynamic, this was multistage, this was time-varying. I thought, let's kill two birds with one stone. Let's take a word that has an absolutely precise meaning, namely dynamic, in the classical physical sense. It also has a very interesting property as an adjective, and that is it's impossible to use the word dynamic in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It's impossible. Thus, I thought dynamic programming was a good name. It was something not even a Congressman could object to. So I used it as an umbrella for my activities.<a href="https://en.wikipedia.org/wiki/Dynamic_programming#History" rel="nofollow">https://en.wikipedia.org/wiki/Dynamic_programming#History</a>

评论 #26150279 未加载

评论 #26149539 未加载

intrasightover 4 years ago

I was told ~30 years ago by a leading computer scientist in the NN field that biology has nothing to teach us in terms of implementation. I switched from CS to neuroscience anyway. I've wrestled with his statement ever since. I'll say that nothing I've seen since then has shown him wrong.

评论 #26150066 未加载

评论 #26152157 未加载

评论 #26147962 未加载

评论 #26148332 未加载

评论 #26152543 未加载

taliesinbover 4 years ago

While the continual one-upmanship of ever more intricate biologically plausible learning rules is interesting to observe (and I played around at one point with a variant of the original feedback alignment), I think OP's alternative view is more plausible.Fwiw I am involved in an ongoing project that is investigating a biologically plausible model for generating connectomes (as neuroscientists like to call them). The connectome-generator happens (coincidentally) to be a neural network. But exactly as the OP points out, this "neural network" need not actually represent a biological brain -- in our case it's actually a hypernetwork representing the process of gene expression, which in turn generates the biological network. Backprop is then applied to this hypernetwork as a (more efficient) proxy for evolution. In the most extreme case there need not be any learning at all at the level of an individual organism. You can see this as the ultimate end-point of so-called Baldwinian evolution, which is the hypothesized process whereby more and more of the statistics of a task are "pulled back" into genetically encoded priors over time.But for me the more interesting question is how to approach the information flow from tasks (or 'fitness') to brains to genes on successively longer time scales. Can that be done with information theory, or perhaps with some generalization of it? I also think it is a rich and interesting challenge to parameterize learning rules in such a way that evolution (or even random search) can efficiently find good ones for rapid learning of specific kinds of task. My gut feeling is that biological intelligence has many components that are ultimately discrete computations, and we'll discover that those are reachable by random search if we can just get the substrate right, and in fact this is how evolution has often done it -- shades of Gould and Eldredge's "punctuated equilibrium".(if anyone is interested in discussing any of these things feel free to drop me an email)

评论 #26150869 未加载

timlarshansonover 4 years ago

But, if your realistically-spiking, stateful, noisy biological neural network is non-differentiable (which, so far as I know, is true), then how are you going to propagate gradients back through it to update your ANN approximated learning rule?I suspect that given the small size of synapses the algorithmic complexity of learning rules (and there are several) is small. Hence, you can productively use evolutionary or genetic algorithms to perform this search/optimization. Which I think you'd have to due to the lack of gradients, or simply due to computational cost. Plenty of research going on in this field. (Heck, while you're at it, might as well perform similar search over wiring typologies & recapitulate our own evolution without having to deal with signaling cascades, transport of mRNA & protein along dendrites, metabolic limits, etc)Anyway, coming from a biological perspective: evolution is still more general than backprop, even if in some domains it's slower.

评论 #26147941 未加载

Digitalis33over 4 years ago

DeepMind, Hinton, et al are still convinced that the brain must be doing something like backprop.See Lillicrap address all common objections to backprop in the brain: <a href="https://www.youtube.com/watch?v=vbvl0k-aUiE&ab_channel=ELSCVideo" rel="nofollow">https://www.youtube.com/watch?v=vbvl0k-aUiE&ab_channel=ELSCV...</a>Also from their paper Backpropagation in the brain:"It is not clear in detail what role feedback connections play in cortical computations, so we cannot say that the cortex employs backprop-like learning. However, if feedback connections modulate spiking, and spiking determines the adaptation of synapse strengths, the information carried by the feedback connections must clearly influence learning!"

pizzaover 4 years ago

If you are interested in deep learning with spiking neural networks there is also the norse framework: <a href="https://github.com/electronicvisions/norse" rel="nofollow">https://github.com/electronicvisions/norse</a>

评论 #26148507 未加载

评论 #26148601 未加载

erikeriksonover 4 years ago

Hebbian learning is a good biologically plausible learning rule. It works.Minsky's result used a popular but too simple model. Still, that led to back propagation which the field has been squeezing as much as it can out of since.Decades ago that result was bypassed by adding a term of location into the network model (i.e. Hopfield+Hebbian) and modulating according to a locationally differentiated trophic factor (i.e. the stuff that the molecular processes of learning use as input). This allows for linearly inseparable functions to be learned (in not really but important "contradiction" to Minsky's result). Jeffrey Elman and others found this in the 90s and I was able to replicate it up to six dimensions in 2004. So we didn't really need back propagation, though it's been useful.Admittedly these models remove even more legibility from the models.

ilakshover 4 years ago

Predictive coding seems not only plausible but also potentially advantageous in some ways. Such as being inherently well-suited to generative perception.

xingyztover 4 years ago

I'm not very familiar with deep learning. How does this compare to the biomimicking Spike Time Dependent Plasticity of spiking neural networks?<a href="https://github.com/Shikhargupta/Spiking-Neural-Network#training-an-snn" rel="nofollow">https://github.com/Shikhargupta/Spiking-Neural-Network#train...</a>

评论 #26147618 未加载

marco_craveiroover 4 years ago

I'm surprised that Numenta [1] did not get a mention in the comments. As a mostly lay person (with moderate exposure to Computational Neuroscience) I quite like their approach even though it seems their results have slowed down a bit of late. I still haven't finished parsing BAMI [2], but seems very interesting.[1] <a href="https://numenta.com/" rel="nofollow">https://numenta.com/</a> [2] <a href="https://numenta.com/resources/biological-and-machine-intelligence/" rel="nofollow">https://numenta.com/resources/biological-and-machine-intelli...</a>

scribuover 4 years ago

> You can indeed use backprop to train a separate learning rule superior to naïve backprop.I used to dismiss the idea of an impending singularity. Now I'm not so sure.Hopefully AGIs will reach hard physical limits to self-improvement, before taking over the world.

评论 #26148454 未加载

评论 #26148785 未加载

neatzeover 4 years ago

Comparisons of various neural architectures.Deep Learning in Spiking Neural Networks: <a href="https://arxiv.org/pdf/1804.08150.pdf" rel="nofollow">https://arxiv.org/pdf/1804.08150.pdf</a>

monocasaover 4 years ago

I've never understood why biological neural nets would need back prop.Evolutionary pressure is it's own applied loss function. It's less efficient than back prop, but gets you to solutions all the same.

评论 #26147891 未加载