So you mean to tell me that rediscovering older work with faster computers and strategically referencing (or omitting references) to make yourselves look like the founding fathers of a resurgent field is a good way to get your university-funded research lab snatched up for millions of dollars by Big Companies?<p>And to add onto this, another academic is getting his jimmies rustled because he didn't get the money his former PhD students did??<p>Heavens to Betsy!<p>---<p>Everyone self promotes and does things in their best interests. There is no clear divide between academic and industrial interests. Maybe in something more pure where truths are evident (e.g. pure math). But not something like machine learning, where your success and funding depends on armies of grad students fine-tuning stuff like the number of "hidden units" in some over-complicated model whose ultimate goal is to over fit a training set and cause more hype, etc.<p>Nothing wrong with this though, progress happens continually, just not linearly: <a href="https://en.wikipedia.org/wiki/Hype_cycle" rel="nofollow">https://en.wikipedia.org/wiki/Hype_cycle</a>
So, for anyone not aware: Schmidhuber is _obsessed_ with this. He wrote an enormous literature review of deep learning [0] basically because he felt that people weren't crediting ideas enough. This isn't a one-off essay, for him, he's been banging this drum for quite a while.<p>Not saying he's wrong, just FYI.<p>[0] <a href="http://arxiv.org/abs/1404.7828" rel="nofollow">http://arxiv.org/abs/1404.7828</a>
While I do not have anything invested in Deep Learning, I do have a similar reaction because I am familiar with the research from 10-20 years ago, particularly around neural Turing machines. From that perspective, most modern Deep Learning is essentially that older research with the primary novelty being better marketing and <i>much</i> faster computers. I can understand why someone like Schmidhuber would be irritated by the apparent assignment of credit to people who are essentially repackaging old computer science, given how much Schmidhuber has done in the field.<p>DeepMind is a bit of an exception to this. At least one of the founders was involved in quite a bit of original research way back then.<p>This phenomenon is common in theoretical computer science. Timing and marketing matter a lot when it comes to getting credit for important inventions. I've seen it many times.
I started taking machine learning courses in 1999 until later 2001. One of my professors (who had worked with Vapnik back when we didn't know if support vector machines were a good idea) said that he didn't use ANN too much because probably Hinton was the only one who knew how to used them.<p>I'm telling this anecdote because, even when I agree that we are forgetting to mention a lot of names, that "PR" work that Hinton et all did, was necessary (IMO) to bring ANN back to the mainstream area.
This is a good critique: it's important to cite the people who have laid the early groundwork, regardless of how far in the past that work was done.
The author certainly seems accomplished, but his tone and egotism undercut his message. For example from the front page of his site:<p>"His formal theory of creativity & curiosity & fun explains art, science, music, and humor."<p>I've also read papers of his that take completely off-the-wall pot-shots at other researchers.
lecture of the topic by the author: <a href="https://www.youtube.com/watch?v=JSNZA8jVcm4" rel="nofollow">https://www.youtube.com/watch?v=JSNZA8jVcm4</a><p>Several founders from Deepmind where his PhD students.
One thing which works against the "cite everything" approach is that most of the major conferences have page limits of 8-10 pages with 1 page bonus for references. That means if you go over 1 page of references for (at least NIPS) then you cut into the meat of the paper, reviewers look on in disdain and give poor marks, etc. So you have to actively prune for the most recent and directly relevant citations many times, which sometimes counts out semi-relevant but older work in favor of more relevant recent work.<p>Much of Dr. Schmidhuber's work is very interesting and <i>especially</i> relevant now that RNNs are really heating up again - but it is sometimes hard to figure out exactly <i>which</i> of his papers to cite because many are partially relevant. And having a full page of only Schmidhuber citations is no good either...<p>Speaking as a member of the Montreal lab, I am much more up to date with the work that happens here - so it is hard to fight the natural tendency to cite recent papers you know (since they all came from work you know of, cause you were <i>there</i>). Notice too that all 3 (Hinton, LeCun, and Bengio) worked directly together at some point, and collaborated often beyond that. So a version of this is in effect, whereas Juergen has been more separated (both geographically, and work focus wise) than the other 3. NYU Toronto and Montreal are all in an 8 hour triangle!<p>Not to take anything away from his points (I try to cite as many of his papers as possible without seeming ridiculous, generally) but these are the general factors at play. We cannot possibly cite every paper in the field, and shining the light on new works can be more important than citing older work <i>AS LONG AS</i> there is no claiming as a pure innovation work that was already done "in the nineties".<p>Claiming to improve some technique or take it from curious to usable is more than fine - but given the recent deep learning hype even recent papers are getting overshadowed by others claiming some new innovation which already exists in <i>very current literature</i>.<p>Especially given the work that is coming out of industrial labs (Google, FB, MSR, etc.) it is fairly frequent to see the same model being touted as new (with minor citations if lucky) when the exact same technique first appeared 6 months ago. Being well-read is not an option as an academic - it is a requirement! The PR machine of these companies is unfortunately very effective at dominating the airwaves if you have competing or related work, especially if you are not from a school with good press e.g. MIT, Stanford.
I don't know.<p>On the one hand, at the conceptual level, unless you are at the cutting edge of CS theory, I'm pretty sure almost anything else that is done in computer science is a mere re-wording of something that was done in the 1970-1980s. So there is no "holier than thou" at this level.<p>On the other hand, in terms of practical results in context, there are many important consequences of being able to take old concepts and run them faster, because the hardware has improved and well, generally, the entire world is different.<p>A big part of "popularizing" a technique is having a good implementation that takes advantage of advances in computing speed. So the author of the article misses the practical value of popularizing.<p>At the same time, what he says is valuable because he touches on a fundamental choice that we all make: do you want to be a groundwork layer or a popularizer?<p>The problem of course is that groundwork layers are mostly forgotten, with their contributions recognized posthumously, as that's how far out you have to be to lay any new groundwork, and it's difficult to predict what will be the foundation for the next hundreds of years.<p>It's not just deep learning, it's basically that anything that becomes popular enough to be noticed here probably has a long history behind it, and if we are to move forward we need to be in the headspace of those who had the sense back then to form it, and not be in the space of popularizing or being the tool of the popularizer.
Right or not (I tend to think he is), it's incredibly shortsighted to write such an article which is bound to make him look bitter and low status. If you want to do this, you get a third party to it, come on...