There's been a lot of discussion of the copyright questions surrounding things like Copilot, DallE, ChatGPT, etc. Is there any sort of measure of novelty in output for these generative systems? For example, it'd be interesting to see a secondary system that finds the most 'similar' examples from the training corpus (though, conveniently, that's often private).<p>Clearly, something that just spits out a training example as output for a prompt is less 'novel' than something that can integrate bits and pieces from a wide corpus into something new, but I'm not sure how quantifiable that is in practice.