To me, the ML situation looks roughly like this.<p>(1) Model weights are something like a bytecode blob. You can run it in a conformant interpreter, and be able to do inference.<p>(2) Things like llama.cpp are the "bytecode interpreter" part, something that can load the weights and run inference.<p>(3) The training setup is like a custom "compiler" which turns training data to the "bytecode" of the model weights.<p>(4) The actual training data is like the "source code" for the model, the input of the training "compiler".<p>Currently (2) is well-served by a number of open-source offerings. (1) is what is usually released when a new model is released. (1) + (2) give the ability to run inference independently.<p>AFAICT, Red Hat suggests that an "open-source ML model" must include (1), (2), and (3), so that the way the model has been trained is also open and reusable. I would say that it's great for scientific / applied progress, but I don't think it's "open source" proper. You get a binary blob and a compiler that can produce it and patch it, but you can't reproduce it the way the authors did.<p>Releasing the training set, the (4), to my mind, would be crucial for the model to be actually "open source" in the way an open-source C program is.<p>I understand that the training set is massive, may contain a lot of data that can't be easily released publicly but that were licensed for the training purposes, and that training from scratch may cost millions, so releasing the (4) is very often infeasible.<p>I still think than (1) + (2) + (3) should not be called "open-source", because the source is not open. We need a different term, like "open structure" or something. It's definitely more open than something that's only available via an API, or as just weights, but not completely open.
I prefer the ML policy of the Debian Deep Learning Team.<p><a href="https://salsa.debian.org/deeplearning-team/ml-policy/" rel="nofollow">https://salsa.debian.org/deeplearning-team/ml-policy/</a>
> More than three decades ago, Red Hat saw the potential of how open source development and licenses can create better software to fuel IT innovation. Thirty-million lines of code later, Linux not only developed to become the most successful open source software but the most successful software to date.<p>This seems to conflate Red Hat and Linux, as well as try to equate Red Hat with open-source. Red Hat is Linux, but Linux is not Red Hat, especially now that Red Hat has decided to restrict access to the RHEL source (<a href="https://www.itworldcanada.com/article/red-hat-decision-turns-world-of-open-source-linux-upside-down/543157" rel="nofollow">https://www.itworldcanada.com/article/red-hat-decision-turns...</a>).<p>And a pet grammatical peeve of mine:<p>> ... in some respects they serve a similar function to code.<p>I see this everywhere now -- IMHO it should be "... serve a function similar to code." Doesn't the original grate on your ear?<p>Also this is a Turing-test bot detector -- bots don't use this weird grammatical construction, only humans do.
Congrats NeuralMagic team on being acquired! I don't know if you know this, but I worked with you on discord a few times. Your team's always willing to go above and beyond with pushing out popular models in specific quant formats compatible with vLLM. And one of the few huggingface orgs that my boss can actually trust. Well deserved!
Disappointing that red hat is basically validating open weights as open source, and excusing it by saying this:<p>> The majority of improvements and enhancements to AI models now taking place in the community do not involve access to or manipulation of the original training data. Rather, they are the result of modifications to model weights or a process of fine tuning which can also serve to adjust model performance.<p>Well yes, because they have no access to anything more. With training source code and data they might do something different. If you don’t have all the things used to produce the final result, it’s not open source.
> We believe that these concepts can have the same impact on artificial intelligence<p>Where the concept is the exploitation of thousands of volunteers while repackaging their work. (I know that RedHat sponsors <i>some</i> people, sometimes to the detriment of projects, but a lot of it is not sponsored, especially when RedHat established itself.)
I largely agree with these points, however it is an awkward position coming from Red Hat which is the best funded Linux distribution there is, and -still- not part of the reproducible builds project or investing in full source bootstrapping which means no one can exactly reproduce their published artifacts from source or prove they were not tampered with. (Same with Fedora)<p>Glass houses.