This is not controversial statement. Most Deep Learning pioneers (like Hinton and Bengio) agree and try to find a new way out.<p>Of course there has been many breakthroughs. Solving exploding or vanishing gradient problem is important. Throwing in more hardware can't solve the problem if gradients don't work.<p>The large picture is that they are are all just technical tweaks for the same basic underlying idea that has existed since 80's and 90's. Just adding more layers is not enough.<p>Deep learning is still statistical learning. DL algorithms attempts to learn output distribution that matches the target distribution.