I do think OpenAI has a point in what they're saying: if we expect human-level competency of AI, it needs to be able to see and train on human-accessible content and ideally with a similar distribution.<p>For example, I make an open source Firefox web extension for filtering internet content with my own classifier. That literally would not be able to exist without being able to be trained on web content, much of which is copyrighted. Requiring that I somehow either a) use only attributed data or b) detect and not use copyrighted content when trying to build something representative of my source distribution (e.g. the web) sounds like a recipe for a poor outcome. Now maybe my addon isn't your cup of tea - but what if you found out that the next generation of uBlock Origin etc. could not be as effective because of legislation because it wanted to use an AI model? Legislating too heavily around this area will, I believe, have a tremendous chilling effect for small businesses and open source folks trying to innovate in AI.<p>I've also worked commercially in the creation of two closed source machine learning models, but the domains were restricted enough that web content was not a particularly helpful input. One did all right, and one did not. Seeing bets succeed and fail gives me appreciation for the long-term and uncertain bets that OpenAI has been making for ages finally coming to fruition. I think without businesses being willing to make those bets the GPU-hours would have been hard to pay for.<p>I've wondered if potentially a different way out of this is not restricting the use of copyrighted material in the training process itself, but rather to instead only consider the created final works. Of course there are thorny problems there, too, but I don't see that having the same chilling effect on research and probably a lesser effect on business as well. One thing I think is clear though: we've reached a tipping point in the US similar to 1998 when the DMCA was legislated where the technology is forcing us to think carefully about what copyright means.<p>So I have question for those on HN who have meaningfully worked in the creation of not just AI-generated content, but in the creation of some AI model that others use freely or commercially: what seem like promising paths forward here?
Or to those working in copyright law (like @williamcotton): how do you see the status quo and potential paths forward?