There is a recent 5 page theoretical paper on this topic that I thought was pretty interesting, and it tackles both deep nets and recurrent nets: <a href="http://arxiv.org/abs/1509.08101" rel="nofollow">http://arxiv.org/abs/1509.08101</a><p>Here is the abstract:<p>This note provides a family of classification problems, indexed by a positive integer k, where all shallow networks with fewer than exponentially (in k) many nodes exhibit error at least 1/6, whereas a deep network with 2 nodes in each of 2k layers achieves zero error, as does a recurrent network with 3 distinct nodes iterated k times. The proof is elementary, and the networks are standard feedforward networks with ReLU (Rectified Linear Unit) nonlinearities.
1) I am curious about learning more about the statement: "Deep learning is a branch of machine learning algorithms based on learning multiple levels of representation. The multiple levels of representation corresponds to multiple levels of abstraction. "<p>What evidence exists that the 'multiple levels of representation', which I understand to generally be multiple hidden layers of a neural network, actually correspond to 'levels of abstraction'?<p>2) I'm further confused by, "Deep learning is a kind of representation learning in which there are multiple levels of features. These features are automatically discovered and they are composed together in the various levels to produce the output. Each level represents abstract features that are discovered from the features represented in the previous level. "<p>This implies to me that this is "unsupervised learning". Are deep learning nets all unsupervised? Most traditional neural nets are supervised.
I wonder if "lots of data" is wrong. If I show you say twenty similar-looking Chinese characters in one person's handwriting, and the same twenty in another person's handwriting, you'll probably do a good job (though maybe not an easy time) classifying them with very little data.