New York Times considers legal action against OpenAI as copyright tensions swirl

134 点作者 8ig8将近 2 年前

31 条评论

pimterry将近 2 年前

Honestly, I think generative AI losing a massive copyright showdown is inevitable at this stage.It's extremely easy to get the latest generation of AIs to produce outputs that in many fields sans-AI would be trivially considered as IP infringement.While there are many interesting reasonable legal & technical arguments that it's not, the result completely undermines copyright protections regardless. If that's accepted at scale, copyright in practice will change completely. In effect, the choices are "block this, or entirely destroy copyright protections in many industries". You can't allow this without eventually allowing everybody to simulate their own NY Times reporters, produce their own Marvel movies, and create their own Taylor Swift albums.If you do allow that, the many many affected industries have catastrophic problems.Problematic though copyright laws are, I see no world where all those protections go away any time soon, and so if the courts don't agree to protect copyright already in this scenario, then it will eventually be legislated to make that happen. AI consuming copyrighted data and producing an output has to be considered a derivative work (or indeed, the model itself will be considered a derivative work) or IP protections are effectively broken.There's a grace period now while we work our way there, but the politics is pretty clear and with no plausible path to "let's drop copyright completely" ASAP, I just don't see any other result in the medium term. Doesn't mean the end of generative AI by any means, just a slowdown as we move to a world where you need to negotiate rights and buy data to feed it first, instead of scraping everybody else's for free.

评论 #37159654 未加载

评论 #37168325 未加载

评论 #37161858 未加载

评论 #37168821 未加载

评论 #37166473 未加载

评论 #37159570 未加载

评论 #37160231 未加载

评论 #37161587 未加载

评论 #37170921 未加载

ojosilva将近 2 年前

IANAL, but copyright protections are pretty much tied to content and format and not to the idea itself, with the intent of preventing (or putting a price on) the copying of original works. The Times will have a very hard time proving that their content is being re-marketed by OpenAI. Having a competing product based on your ideas.Compare:"Steve Jobs [was] a tyrant": <a href="https://www.nytimes.com/2011/10/07/technology/steve-jobs-defended-his-work-with-a-barbed-tongue.html" rel="nofollow noreferrer">https://www.nytimes.com/2011/10/07/technology/steve-jobs-def...</a>Against:"Whether to describe SJ as a tyrant is a matter of perspective...": <a href="https://chat.openai.com/share/28633f0c-007f-48b6-a615-1581c3ef3041" rel="nofollow noreferrer">https://chat.openai.com/share/28633f0c-007f-48b6-a615-1581c3...</a>The general way LLMs work do not preserve content in it's original form: the ideas they contain are extracted and clustered statistically - as a ELI5 refresher, an LLM reads 2 million NY Times articles and records that after the word "Steve" there are a lot of "Jobs" followed by a lot of "was a genius/tyrant", "founded Apple", etc. Then LLMs recreate the user question "Who was Steve Jobs?" using this complex net of token/word stats. Is that fair use? I think OpenAI lawyers will not even tap the fair use question, they will simply state that no copy happened, just a statistical collection of words from various sources.And importantly: no LLM source is really prevalent, so the end result cannot be even be traced back to the source, especially if multiple, similar news sources are being fed to training. I have no idea how the Times is going to prove that its _theirs_ news.

评论 #37159929 未加载

评论 #37159328 未加载

评论 #37161478 未加载

评论 #37159476 未加载

评论 #37159314 未加载

评论 #37159566 未加载

评论 #37160716 未加载

评论 #37162031 未加载

评论 #37159631 未加载

评论 #37160695 未加载

评论 #37167456 未加载

评论 #37160185 未加载

评论 #37159693 未加载

rurp将近 2 年前

I don't think it's an exaggeration to say that LLMs might lead to the end of the open web, or at least a drastically reduced version of it. So much of these model's utility is in directly competing with the producers of the training data. Content creators and aggregators are seeing more and more reason to restrict and limit access, to avoid having AI companies consume all of their data and then be the ones making money from it going forward.I fear that LLMs are going to cause the internet to be a much worse and less open space.

评论 #37155582 未加载

评论 #37158443 未加载

评论 #37158567 未加载

评论 #37158439 未加载

akolbe将近 2 年前

There is a very real risk that we end up with an inferior product cannibalizing a superior one and driving it out of business.Moreover, AI would seem to be even more susceptible to capture and manipulation than conventional media.When it's a question of guiding thought I prefer the humanities to tech. (Same with art.)

评论 #37158472 未加载

评论 #37158878 未加载

评论 #37160227 未加载

bubblethink将近 2 年前

>If, when someone searches online, they are served a paragraph-long answer from an AI tool that refashions reporting from The Times, the need to visit the publisher's website is greatly diminished, said one person involved in the talks.If, when someone reads a newspaper, they are served a paragraph-long answer from an NYTimes reporter that refashions reporting from local sources, the need to interact with the local sources is greatly diminished.

评论 #37155590 未加载

评论 #37158506 未加载

machdiamonds将近 2 年前

Don't humans operate similarly? We gain knowledge through experiences. These AI models effectively condense a vast amount of experience data into weights. Considering the global race in AI advancements, I'm skeptical about the success of these copyright claims. I do find it hypocritical that OpenAI says that other LLMs can't be trained on data generated by their LLMs.

评论 #37159148 未加载

评论 #37159050 未加载

评论 #37159106 未加载

评论 #37159222 未加载

评论 #37159289 未加载

评论 #37158984 未加载

评论 #37159415 未加载

评论 #37159112 未加载

评论 #37234351 未加载

评论 #37159321 未加载

评论 #37159720 未加载

评论 #37158952 未加载

PeterisP将近 2 年前

I think that the proper outcome for all of this would be acknowledgement that the current copyright laws very poorly regulate this aspect, that the key parts of any such legal action are at the not-really-described edges of law because these edges weren't relevant until now; and so instead of waiting for courts ruling on how law-as-written-now applies and accepting these rulings, we will likely get some new legislation explicitly setting what the legal norms should be.In the short term, of course, the existing law matters, but the main discussion should be not on how to apply existing law but how to ensure that the new laws match what we-the-people would want.

评论 #37159073 未加载

baby-yoda将近 2 年前

These mega LLMs that can autonomously roam the web and consume original content are basically the "I made this" meme[0] and having some legal precedent would be good for all users of the web.[0] - <a href="https://knowyourmeme.com/memes/i-made-this" rel="nofollow noreferrer">https://knowyourmeme.com/memes/i-made-this</a>

评论 #37159019 未加载

jiscariot将近 2 年前

My concern isn't copyright law, but that if trained on the NYT, these llms are going to be favorable to starting conflicts in the middle-east.<a href="https://fair.org/home/20-years-later-nyt-still-cant-face-its-iraq-war-shame/" rel="nofollow noreferrer">https://fair.org/home/20-years-later-nyt-still-cant-face-its...</a>

oefrha将近 2 年前

Hopefully soon enough (within a decade?) we’ll all be able to run large language models on cheap consumer devices, and model weights containing everything including NYT will be floating around in the form of warez readily consumed by anyone with a modicum of savvy, whether NYT likes them or not. They can’t stop progress.

brrrrrm将近 2 年前

What’s going to be the name used for the laws that attempt to tackle machine paraphrasing?

评论 #37155085 未加载

评论 #37155095 未加载

评论 #37155101 未加载

评论 #37155283 未加载

toss1将近 2 年前

>>A top concern for The Times is that ChatGPT is, in a sense, becoming a direct competitor with the paper by creating text that answers questions based on the original reporting and writing of the paper's staff.This seems to me to be completely standard in the newspaper industry. Many times every week, I see stories in the form "The [Major_News_Outlet] reports that [Event_X occurred] or [their investigation revealed Y] and here are the details [...].Copyright protects the expression of an idea, not the idea itself. If you write a history of Issac Newton or the invention of semiconductors, I cannot copy that wholesale and sell it as mine, but nothing prevents me writing my own version, even using the same facts and citing your work.I'm quite sure that I could provide a service where a bunch of workers read NYT articles and write brief summaries. I'm not sure they would even need citations, as long as we don't copy chunks wholesale.If OpenAI is simply parroting the words of the NYT articles without Fair Use constraints (short blurbs), it seems they have a problem. If they are fully re-writing them into short non-copying summaries, it seems the NYT has a problem.It'll be interesting to see how the courts sort this out.

t_luke将近 2 年前

The precedent people should be paying much more attention to is sampling in music. When it first arose, it really wasn’t clear what status it had. There was at least a decade when people basically thought it was legal to use small samples of other recordings because they were small and the new use turned them into something unrecognisably different. Which was kind of logical, actually, but turned out not to be true!The current legal requirement to get clearance for all samples only arose after a bunch of court cases in the late 80s/ early 90s, mostly involving quite obscure musicians.There are a lot of people on here who assume that ‘logic will prevail’ in the courts on questions like use of copyrighted data in training data. History shows that this really isn’t a safe assumption. The courts have historically been extremely favorable to copyright holders. It would be foolish to underestimate the legal risk to openai et al here

8ig8将近 2 年前

Interesting point…> A top concern for The Times is that ChatGPT is, in a sense, becoming a direct competitor with the paper by creating text that answers questions based on the original reporting and writing of the paper's staff.

评论 #37155048 未加载

CatWChainsaw将近 2 年前

"I'll keep saying it every time this comes up.I LOVE being told by techbros that a human painstaking studying one thing at a time, and not memorizing verbatin but rather taking away the core concept, is exactly the same type of "learning" that a model does when it takes in millions of things at once and can spit out copyrighted writing verbatim."Personally I think they argue that way because they get off on being contrarian out of spite, but to me it's just a signal of maliciousness and stupidity all at once.

FrustratedMonky将近 2 年前

If a human reads something, it goes into their brain, and it becomes an influence on future works they produce.This doesn't mean that 'copywrite' extends into my brain. A company can't copywrite what I'm thinking about. And what if I do try to paraphrase something from memory, from a few sources, and happen to spit out a very similar sentence from memory. Am I breaking the law?To go further. Since all knowledge is pretty much fed into a human from hundreds of books, movies, TV, internet, all pumped into a human from birth. Then everything in the brain is a product of something with a copywrite. So anything produced is some amalgamation of copywrites.Why not use similar argument for AI. It is clear when asking it to do something like "write a screen play for Othello using dialog like Tarantino, but with bit of style like Baz Luhrmann". That what it produces is 'as unique as a human' would be, or just as filled with things that have copywrites.

评论 #37155120 未加载

评论 #37155320 未加载

评论 #37155225 未加载

voytec将近 2 年前

> if a federal judge finds that OpenAI illegally copied The Times' articles to train its AI model, the court could order the company to destroy ChatGPT's dataset, forcing the company to recreate it using only work that it is authorized to use.I'd like to see it happening but it sounds unrealistic.

评论 #37158870 未加载

评论 #37158666 未加载

评论 #37158893 未加载

评论 #37158959 未加载

whywhywhywhy将近 2 年前

If writing a few paragraphs around something someone else said is copyrightable to you then isn’t GPT writing a few paragraphs around your work copyrightable to OpenAI too…

mediumsmart将近 2 年前

I think anybody should have the right to protect the word combinations they own by not publishing them on the internet.

评论 #37163276 未加载

pierrefermat1将近 2 年前

<a href="https://www.youtube.com/watch?v=MFKV48ikV5E">https://www.youtube.com/watch?v=MFKV48ikV5E</a>Relevant to the article: Large Language Models Meet Copyright Law at Simons

mensetmanusman将近 2 年前

“In the end lawyers saved humanity from an all powerful AI.”

bubblethink将近 2 年前

I think all OpenAI needs to do is scan physical newspapers and OCR them. No ToS to agree to, and no ToS on print editions.

评论 #37155736 未加载

zb3将近 2 年前

It's time to abolish copyright.

exabrial将近 2 年前

Good. Literally anyone’s copyrighted comments on the internet should get a settlement

olgeni将近 2 年前

They own copyright on hallucinating weapons of mass destruction? :D

robbywashere_将近 2 年前

incoming backroom payment deals with publishers. "OpenAI now features training data from our partners X, Y, and Z"

villgax将近 2 年前

Sue these hypocritical fair-use citers who prevent people from training on their own outputs. Force them to reveal their entire training set for generating oblong statments

honeybadger1将近 2 年前

A skirmish to not use our collective acquired knowledge and hide it behind selfish...capitalist gain.

vldchk将近 2 年前

While (in general) I agree with arguments against “copywriting hell”, in particular this case it is not about copywriting itself, but about the consequences of GenAI to entire industry.Journalists exist not without a reason, yes they work with facts and very often — open facts, but they still assemble those facts in certain way to construct a narrative, connect dots and tell us some story (not counting cases when journalist works with their sources and produce a unique inside information). Then OpenAI comes, says “thank you very much” and assemble all of journalists work into one Uber Knowledgeable Journalist who can answer all of your questions.So far so good, we create a public good service, and copywriters are in shambles.Until you start making money on it.That’s where the problem.If OpenAI would be a non profit organization like Wiki Foundation, who just wants to make internet as better place — not much arguments you can find to support NYT lawsuit. But monetization changes everything.Basically NYT is not worried about re using its text as itself, it is worried that no one will want to visit NYT no more and will pay Microsoft/Google and get all answers from them.Let’s put an example. There were a famous story when FT journalist discover a massive fraud in Wirecard accounting and essentially lead to a death of this organization. That articles were a result of multi-year reporting work when journalist piece by piece and step by step collect facts, meet people, and eventually spot the gap. Now, in age of Bard/Bing/ChatGPT, you don’t need to read original article to know all of this. You can ask search engine or Chatbot and get essential re phrasing of an original reporter work. You don’t need no more to go to FT, pay them for paywall, watch their ads, etc. Effectively FT make a huge investment into their people to allow them spend 2 years on this issue and report it and now have a 0 leads to their website because all of them are eaten by Google and Microsoft who will sell you their ads and retain you in their monetized products.Imagine that you built a for-profit paid library for some task. You make a code available through paywall and ask people to pay you to get to it and solve their problems. Then Microsoft comes, sneak beyond paywall, scrap your code and publish it recompiled and slightly optimized version in open access, so no one longer ever need to go on your website but ask Microsoft to show them your code.Would you be happy?All of this cases for me make this case not such easy and straightforward as it seems to be “bad copywriters against progress of humanity”.At the end of the day, if NYT/FT/New Yorker and others will stop publishing their work and fire all journalists, will ChatGPT tell us same depth level stories as we read there?

aero-glide2将近 2 年前

Copyrights and patents are holding back humanity.

评论 #37158863 未加载

评论 #37159027 未加载

评论 #37158917 未加载

gmerc将近 2 年前

Social media, especially Facebook and Google News devalued news by commoditizing it.News is trying to avoid the next generation of tech doing that to the long tail of data.

评论 #37158916 未加载