> isn't this a slam dunk case? Meta literally published a paper where they said "We trained this AI on Intellectual Property which we knew had been obtained without the owners' consent."<p>Making/receiving copies without authorization of the rightsholder can be permissible - it'll come down to a fair use analysis. Purpose to me seems highly transformative (and "The more transformative the new work, the less will be the significance of other factors"), but the other factors could swing the other way.<p>Worth noting that when Google Books was determined to be fair use, the amount/substantiality factor didn't weigh against them despite storing full copies of millions of books, because <i>"what matters in such cases is not so much "the amount and substantiality of the portion used" in making a copy, but rather the amount and substantiality of what is thereby made accessible to a public"</i>. That could be seen here as the amount present in generated responses, opposed to everything the model was trained on.<p>> I suspect we're about to hear some arguments from AI-maximalists that LLaMA is sentient and that deleting it would be akin to murder - and wiping out AIs trained on stolen property is literally genocide.<p>Not sure if I can call this a strawman since there will inevitably be someone somewhere making an argument like this, but it's not a defense being used in the lawsuits in question.<p>My primary issues with "wiping out AIs trained on stolen property" are:<p>1. It's not just LLMs that are trained like this. If you're making a model to segment material defects or detect tumors, you typically first pre-train on a dataset like ImageNet before fine-tuning on the (far smaller) task-specific dataset. Even if you believe LLMs are mostly hype, there's a whole lot else - much of which fairly uncontroversially beneficial - that you'd be inhibiting<p>2. Copyright's basis is "To promote the Progress of Science and useful Arts". Wiping out existing models, and likely stifling future ones, seems hard to justify under this basis. Ensuring rightsholders profit is intended as a means to achieve such progress, not a goal in and of itself to which progress can take a back seat<p>3. I do not believe stricter copyright law would help individuals. Realistically, developers training models would go to Reddit/X/Github/Getty/etc. selling licensed (by ToS agreement) user content, and there's little incentive for those companies to pass on the profit beyond maybe some temporary PR moves. Much of what's possible for open-source or academic communities may no longer be, on account of licensing fees<p>4. It doesn't seem politically viable to demand models are wiped out. Leading in the field, and staying ahead of China, is currently seen as a big important issue - we're not going to erase our best models because NYT asked so. Could hope for mandatory licensing - I think it'd still likely be a negative for open-source development, but it's more plausible than deleting models trained on copyrighted material