Ask HN: To what extent is a lack of copyrighted (i.e. protected or non-public) creative content (i.e. novels, screenplays and movies) in training data limiting the potential sophistication of AI-based computer generated content?<p>Example:<p>GPT-3 may be able to superficially impersonate Tyrion Lannister or describe the world in which he inhabits, but because OpenAI can’t (lawfully) use George R.R. Martin’s novels as training data, it will never be able to generate a convincing persona/interactive experience with Tyrion.
No for at least two reasons.<p>(1) Google can out-sue Disney and Sony Records when it comes to copyright.<p>(2) GPT-3 is not that good and won't be. (e.g. it is not a strength that $10 million+ in CPU power was used; if they could do the training on a TI-84 calculator)<p>Something like that might be possible but it will probably be something that has a large "knowledge base" about the world, language, etc. of some kind and takes the book as an argument. (e.g. you can't say that a system is capable of "reading comprehension" if it can only understand a text that is in the training set.)