I don't understand the argument here.<p>Yes, sometimes code is returned that is a verbatim reproduction of the training data. This can be prevented if need be.<p>What I really don't understand is how some people are complaining about GPL'ed code being used for training.<p>What's the difference between a machine looking at the code and learning from it and a human being doing the same. As long as the code isn't patented, there's no reason why I shouldn't be able to look at GPL'ed code and implement the idea using my own code.<p>In other words, is - according to those who think using GPL'ed code for ML training - <i>every</i> implementation a derived work if I looked at GPL'ed code that implemented the same algorithm? Where's the line that separates plagiarism from original work? Is there even such a line? Does it matter whether the GPL'ed code is encoded in human neurons or network weights after looking at it and if so, why?