It's interesting to consider how you might prevent training using a license without being too restrictive.<p>Here is an example of a license that attempts to directly prohibit training. The problem is that you can imagine such software can't be used in any part of a system that might be used for training or inference (in the OS, for example). Somehow you need to additionally specify that the software is used directly... But how, what does that mean? This is left as an exercise for the reader and I hope someone can write something better:<p><pre><code> The No-AI 3-Clause License
</code></pre>
<i>This is the BSD 2-Clause License, unmodified except for the addition of a third clause. The intention of the third clause is to prohibit, e.g., use in the training of language models. The intention of the third clause is also to prohibit, e.g., use during language model inference. Such language models are used commercially to aggregate and interpolate intellectual property. This is performed with no acknowledgement of authorship or lineage, no attribution or citation. In effect, the intellectual property used to train such models becomes anonymous common property. The social rewards (e.g., credit, respect) that often motivate open source work are undermined.</i><p><pre><code> License Text:
</code></pre>
<a href="https://bugfix-66.com/7a82559a13b39c7fa404320c14f47ce0c304facc51cdacbba3f99654652bf428" rel="nofollow">https://bugfix-66.com/7a82559a13b39c7fa404320c14f47ce0c304fa...</a>