Though I haven't analyzed the particulars of this article or the paper -- I can say: we already know that compression is incomplete. That is: we cannot know the actual Kolmogorov Complexity of a piece of data due to the halting problem. It shouldn't be surprising then, that programs of a certain schema, Neural Networks, suffer from similar issues. One could suppose that neural networks that are less code-parameterized, and more data-parameterized (weights), would be less prone to having divergences. Well, it's already established that the more data-like NN aren't turing complete, and aren't powerful enough to solve the kind of problems that we really want to solve (AI.) We have to turn to Hopfield Nets, Boltzman Machines, and RNNs for that. The learning/training process for these nets is pretty much encumbered by their capabilities. That is, exploring a field of numbers is one thing. Exploring a field of code? Code<->Data is the one function in the entire universe that is the most non-linear. It is the one function that cannot be described concisely by mathematics. It's like Wolfram terms, "computationally irreducable." The closer a NN reflects an actual space of turing-complete functions, the farther it is from actually being trainable. Alas...we willl figure out middlegrounds as we have already.<p>[1] <a href="https://en.wikipedia.org/wiki/Kolmogorov_complexity" rel="nofollow">https://en.wikipedia.org/wiki/Kolmogorov_complexity</a>