Wow. As far as I know, this is the first time anyone reputable[a] has claimed to <i>show</i> (!) that the "manifold hypothesis" is the fundamental principle that makes deep learning work, as has long been believed:<p><pre><code> "In this work, we give a geometric view to
understand deep learning: we show that the
fundamental principle attributing to the
success is the manifold structure in data,
namely natural high dimensional data
concentrates close to a low-dimensional
manifold, deep learning learns the manifold
and the probability distribution on it."
</code></pre>
Moreover, the authors also claim to have come up with a way of measuring how hard it is for any deep neural net (of fixed size) to learn a parametric representation of a particular lower-dimensional manifold embedded in some higher-dimensional space:<p><pre><code> "We further introduce the concepts of rectified
linear complexity for deep neural network
measuring its learning capability, rectified
linear complexity of an embedding manifold
describing the difficulty to be learned. Then
we show for any deep neural network with fixed
architecture, there exists a manifold that
cannot be learned by the network."
</code></pre>
Finally, the authors also propose a novel way to control the probability distribution in the latent space. I'm curious to see how their method compares and relates to recent work, e.g., with discrete and continuous normalizing flows:<p><pre><code> "...we propose to apply optimal mass
transportation theory to control the
probability distribution in the latent space."
</code></pre>
This is <i>not</i> going to be a light read...<p>--<p>[a] One of the authors, Shing-Tung Yau, is a Fields medalist: <a href="https://news.ycombinator.com/item?id=18987219" rel="nofollow">https://news.ycombinator.com/item?id=18987219</a>
[An OT question as somebody not familiar with the academic world]
Two of the authors are in a Chinese university. Two of them in different departments in a US university. In general, how does this kind of intercontinental collaboration start, and how do they progress? How are roles defined when multiple people are involved in a theoretical paper like this? Are there some tools that help with collaborative paper writing?
Possibly less sophisticatedly I think of them as a sandwich of affine maps and nonlinear isotropies (as those giving irregular rings in tree trunks). The affinities are represented nicely in GL(n+1) with a homogenous coordinates trick related to neuron biases. A question would be if there's something interesting to say about the interactions of the affinities and isotropies in group theoretic terms (which I dunno).
<i>> ...we show that the fundamental principle attributing to the success is the manifold structure in data...<p>> Then we show for any deep neural network with fixed architecture, there exists a manifold that cannot be learned by the network.</i><p>I'd venture a guess that you can extend this result to show that, for any deep neural network with fixed architecture, there exists an adversarial manifold it must be vulnerable to.<p>In other words not only is there a manifold the neural network <i>cannot</i> learn, but there is also a manifold it <i>will</i> learn, but incorrectly.
Another interesting paper on optimal transportation and GAN: <a href="https://arxiv.org/abs/1710.05488" rel="nofollow">https://arxiv.org/abs/1710.05488</a>
I know next to nothing about deep learning. But this geometric interpretation really reminds of of the way self-organising maps work. Is there a real connection there, or is that superficial?