I am developing a system that will be used to inspect some data and identify things within it manually. I expect that, in some cases, these identifications will be used to train machine learning models. Is there an existing license that I can apply to the software that would require the end products of these outputs (i.e. the identifications and model weights) to be made public? Something like the GPL, but to democratize access to training data and models created downstream.<p>The application is in a niche scientific field and I am not worried about a lack of users, and I expect many users will align with the ethos I am proposing. I am simply wondering if a license or arrangement like this has been created already.
First question to ask yourself, are those identifications copyrightable?<p>Sounds like they are simply facts about the data, so probably not copyrightable. You could require a contract containing forced disclosure in order to download and use the software, but this would be very non-free and GPL incompatible.<p>If they were copyrightable, then they would be owned not by you, but by the person using your software to create the identifications. You could require copyright assignment for anyone who uses your software, but that would be very non-free and GPL incompatible.<p>You might also like to read this:<p><a href="https://salsa.debian.org/deeplearning-team/ml-policy" rel="nofollow">https://salsa.debian.org/deeplearning-team/ml-policy</a>