It's funny to note that the first author signed a boycott [0] for publishing research in Nature's new journal (Nature Machine Intelligence) covering AI topics but is one of the first to publish there.<p>[0] <a href="https://openaccess.engineering.oregonstate.edu/signatures" rel="nofollow">https://openaccess.engineering.oregonstate.edu/signatures</a>
It sounds like we can too determine its practical effect, which is zero. No real-world machine learning algorithm can use the real numbers. We can not obtain them to feed them as input, our real-world machines can not process them if we could, and if we got real-number-based answers, we could not use them. Whether the universe itself is or is not based on real numbers, we do not have access to them either way. Our real-world machines only have access to the computable numbers (and only a subset of them, at that).<p>I'm only speaking to the practical impact.<p>It's possible someone will find some way work this down into something practical, but I'd be fairly surprised. Real machine learners get finite data, for which the continuum hypothesis is irrelevant.<p>(I assume the original authors fully understand this.)
The actual paper: <a href="https://www.nature.com/articles/s42256-018-0002-3" rel="nofollow">https://www.nature.com/articles/s42256-018-0002-3</a><p>(I find the actual research paper easier to understand than the news coverage.)
I much prefer Yedidia & Aaronson's paper that explicitly builds a Turing machine whose halting behavior is independent of ZFC:<p><a href="https://www.scottaaronson.com/busybeaver.pdf" rel="nofollow">https://www.scottaaronson.com/busybeaver.pdf</a>
Though I haven't analyzed the particulars of this article or the paper -- I can say: we already know that compression is incomplete. That is: we cannot know the actual Kolmogorov Complexity of a piece of data due to the halting problem. It shouldn't be surprising then, that programs of a certain schema, Neural Networks, suffer from similar issues. One could suppose that neural networks that are less code-parameterized, and more data-parameterized (weights), would be less prone to having divergences. Well, it's already established that the more data-like NN aren't turing complete, and aren't powerful enough to solve the kind of problems that we really want to solve (AI.) We have to turn to Hopfield Nets, Boltzman Machines, and RNNs for that. The learning/training process for these nets is pretty much encumbered by their capabilities. That is, exploring a field of numbers is one thing. Exploring a field of code? Code<->Data is the one function in the entire universe that is the most non-linear. It is the one function that cannot be described concisely by mathematics. It's like Wolfram terms, "computationally irreducable." The closer a NN reflects an actual space of turing-complete functions, the farther it is from actually being trainable. Alas...we willl figure out middlegrounds as we have already.<p>[1] <a href="https://en.wikipedia.org/wiki/Kolmogorov_complexity" rel="nofollow">https://en.wikipedia.org/wiki/Kolmogorov_complexity</a>
The author puts Gödel's Incompleteness and the Continuum hypothesis on the same level, which is misleading. The continuum hypothesis is unprovable in our current mathematical foundation ZFC, but there are extensions to ZFC that either make the continuum hypothesis true or false.<p>Incompleteness is a property of every sufficiently complex formal system and thus poses a general constraint in logic.<p>That a particular learning problem is not provable in ZFC is not that surprising. Connections between learning and Incompleteness are way more interesting (and there is a lot of pseudo-research going on, "proving" that humans are not simulatable by computer etc.)
How would this affect the practical nature of the problem? Also, why is this novel? All machine learning algorithms can be implemented on a Turing machine, so they all are subject to the halting problem.
I'm curious about the applications of this to machine learning in public policy. In papers like Big Data's Disparate Impact (<a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2477899" rel="nofollow">https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2477899</a>) we see misguided and underbudgeted policymakers using very sketchy and very opaque ML algorithms to, for example, decide who stays in prison and who does not.<p>If we can prove that the "decision problem" of who stays in prison and who does not is <i>undecidable</i>, invariant of the specific implementations of the ML algorithm, this could make a case for stopping such overreaches of ML.
As the authors say in their paper, "[a] closer look reveals that the source of the problem is in defining learnability as the existence of a learning function rather than the existence of a learning algorithm". Algorithmic decision making under uncertainty is NOT the same thing as finding learning functions over infinite domains. Sometimes such a function does not exist. Why should it? Although the authors seem pretty much aware of this fundamental shortcoming in their analysis, they nevertheless go ahead and publish a whole paper about learning functions! This has a name, intellectual fraud.
> Because EMX is a new model in machine learning, we do not yet know its usefulness for developing real-world algorithms. So these results might not turn out to have practical importance.<p>I don't think that's the correct conclusion. More likely, all learning models have unprovable problems so the choice of model is not important wrt the existence of unprovable problems.<p>Really, this is not a surprising result. All mathematical systems gave unprovable problems. ML is a mathematical system and no exception.
Can someone explain the meaning of the sentence: ``All distributions in this text are finitely supported over the σ-algebra of all subsets of X'' ?
On the one hand, it is impossible to define a probability measure on the σ-algebra of all subsets of [0,1]. On the other hand, if a distribution is finitely supported, then there is no point in using a σ-algebra different than what is generated by its support, right?
Conclusions emphasize on no practical use for machine learning algorithms but in the way learnability is defined and treated while using mathematical models to describe it. Some of this mathematical models could result in undecidable algorithms if the wrong model is chosen
The idea that machine learning is a subset of mathematics is very dangerous. In the real world the data is never prepped and clean, and the system is complex and interacts with humans and society at many levels.
> Because EMX is a new model in machine learning, we do not yet know its usefulness for developing real-world algorithms. So these results might not turn out to have practical importance. But we do now know that we should be careful when introducing new models of learning. Moreover, we might need to look again at the many subtleties that can come up, even in established learning models.<p>In other words, "Machine Learning" here is a buzzword, without which the paper would probably get less attention.