Authors sue OpenAI for using their works without proper licensing

43 点作者 ironyman超过 1 年前

19 条评论

<a href="https://archive.ph/Kn5xq" rel="nofollow noreferrer">https://archive.ph/Kn5xq</a>

slavboj超过 1 年前

"But Defendants’ LLMs endanger fiction writers’ ability to make a living, in that the LLMs allow anyone to generate—automatically and freely (or very cheaply)—texts that they would otherwise pay writers to create"This kind of luddism sees copyright as a way to enrich rights holders, as opposed to "promoting the progress of science and the useful arts".

rezonant超过 1 年前

It appears this lawsuit is complaining that ChatGPT can write fan fiction and they don't like that.I was onboard initially thinking we were talking about OpenAI ingesting Game Of Thrones as training material, but it appears George et al are just mad because it can make stories with their characters.This is far from the authorship/copyright problem of AI.

评论 #37639309 未加载

papercrane超过 1 年前

Edit: This comment was for a different headline, doesn't apply anymore.Terrible headline. They're not suing for theft, they're suing for copyright infringement.The rest of the article is reasonable, and they link to the complaint, which is something every article about a lawsuit should do.<a href="https://fingfx.thomsonreuters.com/gfx/legaldocs/xmvjlbqbnvr/AUTHORS%20GUILD%20OPENAI%20LAWSUIT.pdf" rel="nofollow noreferrer">https://fingfx.thomsonreuters.com/gfx/legaldocs/xmvjlbqbnvr/...</a>

评论 #37639224 未加载

评论 #37639162 未加载

1vuio0pswjnm7超过 1 年前

The complaint:<a href="https://ia904703.us.archive.org/10/items/gov.uscourts.nysd.606655/gov.uscourts.nysd.606655.1.0.pdf" rel="nofollow noreferrer">https://ia904703.us.archive.org/10/items/gov.uscourts.nysd.6...</a>Named plaintiffs include David Baldacci, John Grisham and Scott Turow.

mattw2121超过 1 年前

Here's a point that I struggle with. Let's imagine a point in the future where technology has progressed to the point that a machine can become "assisted memory" for a human. This could be useful for degraded memory conditions or even just to buff up human capabilities. In this scenario how do we deal with licensing and copyright? The "memory" is trained on books, artwork, etc and then human intelligence accesses that computer aided memory and constructs something new.Seems like this lawsuit could set precedence that the future I describe would not be allowed.

评论 #37612818 未加载

评论 #37613153 未加载

评论 #37613762 未加载

nemo44x超过 1 年前

How is it different than a search index? It takes existing content as an input, processes it and then outputs data structures from it. Those data structures are then used to power full text search.A LLM does the same thing but instead of search results it emits a stream of tokens.

Supply5411超过 1 年前

DMCA 2024 - A nice big report button for when generated content is too close to copyrighted content. It is then on the AI company to supplement the training materials around that content, to dilute the generation of content that could be seen as infringing. So instead of George RR Martin prequels with the same names and characters (because of a lack of training materials), it generates something more generic for the input prompt.Win/win?

评论 #37639206 未加载

评论 #37639171 未加载

评论 #37639556 未加载

Nevermark超过 1 年前

There seems to be confusion. :)Or disagreement anyway, about how comparable photocopiers & copyright are to generative models and protection from unauthorized automated style reproduction.How I look at it:1. In both cases, reproduced copies or reproduced styles, automation destroys economic incentives for creators to make any sustained effort.Without economic protection, it isn’t even a question of less motivation. Creator’s like everyone else need to eat.2. So we protect creative works from complete copies in order to have more creative works.And it is primarily about automation and mass reproduction.Nobody is worried about people hand copying Atlas Shrugged.3. But we also protect copyrighted works from partial copying.Only copying chapters 1-3? Not allowedOnly copying the plot but changing all the names, locations, fashion and colors? Not allowed.4. So now it turns out a different substantial part of a work can be copied via automation. It’s style.Well if you can protect a works plot from automated copies, why not a works style?It is a substantial piece of a creative work.Reasons for protecting style come down to protecting any major part of a copyrighted work.The only thing different now is we have “style reproducers”.So we have to decide, is this essentially the same situation as copyright addresses, or not?5. It is.The exact same trade offs between protection and incentivization exist for extracted & mass reproduced style as they do for extracted and reproduced plot.

评论 #37612454 未加载

评论 #37612427 未加载

评论 #37613539 未加载

rendall超过 1 年前

This is good, and this is inevitable. Creators have control of their copyright, which should include permissions to be used in AI training.

评论 #37611899 未加载

评论 #37615096 未加载

评论 #37612484 未加载

beej71超过 1 年前

Is it illegal to SHA-256 a book without the author's permission?If not, how are we going to legally codify the difference between that and an LLM?

评论 #37614020 未加载

1vuio0pswjnm7超过 1 年前

The complaint mentions LibGen, Z-Library and Bibliotik, and Sci-Hub in a footnote.Thought experiment:What if every person who dowloads materials from the above sources claimed that they were doing so only to "train AI".Many such persons who download from those sources are probably doing so for noncommercial purposes, for example, academic research. Whereas, according to this compaint, OpenAI "intend[s] to earn billions from this technology."

ChrisArchitect超过 1 年前

[dupe]More discussion days back when this was news:<a href="https://news.ycombinator.com/item?id=37585157">https://news.ycombinator.com/item?id=37585157</a><a href="https://news.ycombinator.com/item?id=37599261">https://news.ycombinator.com/item?id=37599261</a>

Charon77超过 1 年前

I don't get it.So AI works can't be copyrighted but training AI using copyrighted materials are copyright infringement?

评论 #37612135 未加载

HerculePoirot超过 1 年前

GRR Martin, the author of Game of Thrones, had the audacity to join this lawsuit. The only thing I expect from AI in this context is NOT to reproduce the shitshow GOT ended up to be.I'm wondering what will the authors do if we develop AIs that are able to find new artistic styles that are not in the dataset ? Would it still pose problem to use their content to learn how NOT to imitate them ?Seems it is possible in collaborative filtering:> Yes, in collaborative filtering, finding empty classes is possible. To recommend items for these gaps, utilize adjacent class information or employ techniques like matrix factorization, content-based filtering, or hybrid systems. These methods predict preferences based on observed patterns, similarities between items, and user preferences, filling in missing data.

Tade0超过 1 年前

I see a business opportunity for someone who can produce a watermark that will visibly poison the learning set.

laudefra超过 1 年前

Interesting how this particular case is generating such conflicting political views

smegsicle超过 1 年前

currently anything generated by llm is not copyrightable, right? so are they really threatened by public domain or what?

评论 #37639308 未加载

kayhi超过 1 年前

Maybe the default should be opt in instead of opt out. Why should copyright holders now have to do work to protect their already protected works?

19 条评论

transitivebs超过 1 年前

<a href="https://archive.ph/Kn5xq" rel="nofollow noreferrer">https://archive.ph/Kn5xq</a>

slavboj超过 1 年前

rezonant超过 1 年前

评论 #37639309 未加载

papercrane超过 1 年前

评论 #37639224 未加载

评论 #37639162 未加载

1vuio0pswjnm7超过 1 年前

mattw2121超过 1 年前

评论 #37612818 未加载

评论 #37613153 未加载

评论 #37613762 未加载

nemo44x超过 1 年前

Supply5411超过 1 年前

评论 #37639206 未加载

评论 #37639171 未加载

评论 #37639556 未加载

Nevermark超过 1 年前

评论 #37612454 未加载

评论 #37612427 未加载

评论 #37613539 未加载

rendall超过 1 年前

This is good, and this is inevitable. Creators have control of their copyright, which should include permissions to be used in AI training.

评论 #37611899 未加载

评论 #37615096 未加载

评论 #37612484 未加载

beej71超过 1 年前

Is it illegal to SHA-256 a book without the author's permission?If not, how are we going to legally codify the difference between that and an LLM?

评论 #37614020 未加载

1vuio0pswjnm7超过 1 年前

ChrisArchitect超过 1 年前

Charon77超过 1 年前

I don't get it.So AI works can't be copyrighted but training AI using copyrighted materials are copyright infringement?

评论 #37612135 未加载

HerculePoirot超过 1 年前

Tade0超过 1 年前

I see a business opportunity for someone who can produce a watermark that will visibly poison the learning set.

laudefra超过 1 年前

Interesting how this particular case is generating such conflicting political views

smegsicle超过 1 年前

currently anything generated by llm is not copyrightable, right? so are they really threatened by public domain or what?

评论 #37639308 未加载

kayhi超过 1 年前

Maybe the default should be opt in instead of opt out. Why should copyright holders now have to do work to protect their already protected works?