Karsten correctly called the consensus on that thread, including my own view as one of the folks named in the article, that accepting less-than-open "public" datasets like the dumps of the Internet made available by the Common Crawl Foundation may have been an acceptable compromise to cast a wider net in recognition of AI industry norms. I no longer believe that to be the case, accepting the Open Source Definition (OSD) author Bruce Perens' view that the data IS the source, on the basis that it is what you need to modify in order to freely change the output of the system.<p>The OSI's position that ANY data is acceptable has shifted the Overton window of Open Source, and categorising it into open, public, obtainable, and unshareable non-public data, only to then accept all of them, is a form of doublespeak appearing to maintain openness while accommodating even the most restrictive data. We don't negotiate with terrorists.<p>Indeed, there are two dimensions to "Open" AI systems which "can be freely used, modified, and shared by anyone for any purpose" (per The Open Definition): openness, which is already well-covered by the Open Source Definition, and completeness, which is covered implicitly — after all, AI systems are software — but which would need to be specified in approved frameworks that could be self-applied like the MOF (were a new Class 0 to be created requiring open data licenses rather than "any license or unlicensed" like Class I).<p>In other words, the Open Source Definition (OSD) covers openness but not completeness (at least not explicitly, which is arguably a bug the community may want to fix in a future version so it covers both). The MOF covers completeness but not openness. The OSI's proposed OSAID covers neither, so any vendor using it to open-wash closed systems as Open Source AI rightly deserves ridicule as it is patently ridiculous.