TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Things are about to get worse for generative AI

403 点作者 eddyzh超过 1 年前

101 条评论

ctoth超过 1 年前
Everybody just buying into the corporate narrative that anyone can actually own these sorts of things.<p>Who truly owns the tales of Snow White and Cinderella?<p>These stories didn&#x27;t originate with Disney; they are part of a rich tapestry of folklore passed down through generations. Disney&#x27;s success was partly built on adapting these existing narratives, which were once shared and reshaped by communities over centuries.<p>This conversation shouldn&#x27;t just be about the technicalities of AI or the legalities of copyright; it should be about understanding the deep roots of our shared culture.<p>At its core, culture is a communal property, evolving and growing through collective storytelling and reinterpretation.<p>The current debate around AI and copyright infringement seems to overlook this fundamental aspect of cultural evolution. The algorithms might be new, but the practice of reimagining and repurposing stories is as old as humanity itself.<p>By focusing solely on the legal implications and ignoring the historical context of cultural storytelling, we risk overlooking the essence of what it means to be a creative society.<p>As a large human model, (no really I could probably lose some weight) I think it&#x27;s just silly how we&#x27;re all sort of glossing over the fact that Disney built their house of mouse on existing culture, on existing stories, and now the idea that we might actually limit the tools of cultural expression to comply with some weird outdated copyright thing is just...bonkers.
评论 #38817946 未加载
评论 #38819419 未加载
评论 #38818334 未加载
评论 #38820163 未加载
评论 #38816928 未加载
评论 #38818313 未加载
评论 #38826182 未加载
评论 #38818404 未加载
评论 #38818059 未加载
Havoc超过 1 年前
To me that’s the wrong question.<p>Everyone knew it was trained on copyrighted material and capable of eerily similar outputs.<p>But it’s already done. At scale. Large corps committing fully. There is no chance of that toothpaste going back in the tube.<p>It’s a bit like when big tech built on aggressive user data harvesting. Whether it’s right, ethical or even legal is academic at this stage. They just did it - effectively without any real informed consent by society. Same thing here - 9 out of 10 people on street won’t be able to tell you how AI is made let alone comment on copyright.<p>So the right question here is what now. And I suspect much like tracking the answer will be - not much.
评论 #38816914 未加载
评论 #38817477 未加载
评论 #38814955 未加载
评论 #38815343 未加载
评论 #38816663 未加载
评论 #38815546 未加载
评论 #38815443 未加载
评论 #38814746 未加载
评论 #38817389 未加载
niemandhier超过 1 年前
Should not be a problem in the EU. Article 3 and 4 of the <i>„ Copyright in the Digital Single Market“</i> Directive already regulate this.<p>Summary by Wolters Kluwer: <i>[…] Everyone else (including commercial ML developers) can only use works that are lawfully accessible and where the rightholders have not explicitly reserved use for text and data mining purposes.</i><p>AFAIK they are discussing something like a robot.txt to flag stuff as „not for training“. You will probably be expected to implement some safeguards and of course the end user will have to be careful in his use of the generated things.<p>Source at Kluwers: <a href="https:&#x2F;&#x2F;copyrightblog.kluweriplaw.com&#x2F;2023&#x2F;02&#x2F;20&#x2F;protecting-creatives-or-impeding-progress-machine-learning-and-the-eu-copyright-framework&#x2F;#:~:text=Taken%20together%2C%20these%20two%20articles,public%20Internet)%20to%20train%20ML" rel="nofollow">https:&#x2F;&#x2F;copyrightblog.kluweriplaw.com&#x2F;2023&#x2F;02&#x2F;20&#x2F;protecting-...</a><p>EU Legal Text: <a href="https:&#x2F;&#x2F;eur-lex.europa.eu&#x2F;eli&#x2F;dir&#x2F;2019&#x2F;790&#x2F;oj" rel="nofollow">https:&#x2F;&#x2F;eur-lex.europa.eu&#x2F;eli&#x2F;dir&#x2F;2019&#x2F;790&#x2F;oj</a>
评论 #38815595 未加载
评论 #38817357 未加载
koliber超过 1 年前
The responsibility for ensuring that copyrights were not violated fall on the person publishing the work. Whether they drew something themselves, hired an apprentice artists with no legal training to draw something, took a photograph of something, or used AI to create an image should not matter.<p>Why does anyone assume that ChatGPT or other tools would NOT produce previously-copyrighted content?<p>I can see a naive assumption that since it is “generated” it’s original. However that assumption falls apart as soon as you replace “ChatGPT” with “junior artist”. Tell them to draw a droid from a sci-fi movie, don’t mention anything else. Don’t say anything about copyrights. Don’t tell them that they have to be original. What would you expect them to produce?
评论 #38819596 未加载
评论 #38816216 未加载
评论 #38815520 未加载
appplication超过 1 年前
There are an alarming number of responses seemingly completely unaware of the core thrust of the article (and NYT lawsuit). ChatGPT was able to reproduce and publish significant portions of NYT articles, completely verbatim for hundred-to-thousand word stretches.<p>It’s not derivative work. We’re way past that. NYT has an exceptionally strong case here and anyone arguing about the merits of copyright is way off the mark. This court case is not going single-handedly to undo copyright. OpenAI has very little going for them other than “this is new, how were we to know it could do this”. So knowing that, the currently trained models are in a very sticky situation.<p>Further, I don’t see NYT settling. The implications are too large, and if they settle with OpenAI, they will have a similar case pop up with every other model. And every other publisher of digital content with have a similarly merited case. This is an inflection point for generative AI, and it’s looking like it will be either much more expensive or much more limited than we originally thought.<p>A side effect of this: I am predicting that we will start to see a rise in “pirate” models. Models who eschew all legality, who are trained in a distributed fashion, and whose weights are published not by corporations but by collectives (e.g. torrent models). There is a good chance we see these surpass the official “well behaved” models in effectiveness. It will be an interesting next few years to see this play out.
评论 #38821794 未加载
评论 #38822597 未加载
评论 #38821763 未加载
marckrn超过 1 年前
I might be a bit idealistic, but I&#x27;ve always believed that the core purpose of art and publishing should be to influence culture and society, not just to make a heap of money. That&#x27;s why I feel original work needs its protection, but it should enter the public domain much sooner to fuel creativity and inspiration. We should be thinking in terms of a few years for this transition, not decades.
评论 #38815012 未加载
评论 #38815216 未加载
评论 #38815027 未加载
keiferski超过 1 年前
These don&#x27;t seem all that difficult to fix to me. Most of the examples are not really generic, but are shorthand descriptions of well-known entities. &quot;Video game plumber&quot; is practically synonymous with &quot;Mario&quot; and anyone that has the slightest familiarity with the character knows this.<p>Likewise, how difficult is it to just use descriptive tools to describe Mario-like images [1] and then remove these results from anyone prompting for &quot;video game plumber&quot;?<p>1. The describe command can describe an image in Midjourney. I imagine other AI tools have similar features: <a href="https:&#x2F;&#x2F;docs.midjourney.com&#x2F;docs&#x2F;describe" rel="nofollow">https:&#x2F;&#x2F;docs.midjourney.com&#x2F;docs&#x2F;describe</a>
评论 #38814890 未加载
评论 #38814666 未加载
评论 #38814717 未加载
评论 #38814753 未加载
评论 #38816201 未加载
评论 #38817172 未加载
WhiteNoiz3超过 1 年前
As I understood it, the legal precedent for generative AI is the same one that allows google to scrape websites in order to index them for search for the common good. Google also can display cached versions of websites which is the original content of those sites. No one is going to say that google is copyright infringement just because it is showing content from other websites verbatim. So I think this is a weak argument. AI would be useless if we had to scrub all cultural references and popular IP&#x27;s (even not so popular ones).<p>Personally, I think generative AI <i>should</i> be able to provide links to similar source material in the training data.. This would be the barest way to compensate those who have contributed to training the AI. I don&#x27;t think generative AI is sustainable in the long term if it ends up killing all the websites&#x2F;artists that created the original material. Plus I think having sources adds a layer of transparency and aids users in understanding when content is hallucinated vs. not. People should be able to opt out of having their content used for training and be able to confirm that it has been removed for future iterations. Let&#x27;s be honest that AI companies are just trying to avoid lawsuits by keeping it secret. These are areas where I think regulation can help rather than worrying about doomsday scenarios.
评论 #38815489 未加载
评论 #38815725 未加载
评论 #38815995 未加载
评论 #38815499 未加载
评论 #38815694 未加载
评论 #38815608 未加载
preommr超过 1 年前
We need clearer laws that only apply to Generative AI. Too many comparisons and parallels are being drawn to actual people. &quot;Like what if someone learned how to draw by watching trademarked material, and then accidentally produced it&quot; But these models aren&#x27;t people and they exist in a category of their own.<p>I do think it&#x27;s somewhat trademark infringement by these models, also that it should be allowed and that ultimate responsibility should be on the person using the images in a final work meant for consumption by the general public as stand alone media.
评论 #38814818 未加载
FridgeSeal超过 1 年前
I am beginning to think that in these discussions these models are functioning more like an obscuring factor than anything else and the discussion is getting bogged down in that, and not the crux of the argument.<p>They’re giving people plausible deniability in the “chain of responsibility”, and I think if we took away “LLM” and replaced it with “fairground sideshow magic box” the argument that LLM’s are somehow special and deserving of exemptions disappears real quick.
评论 #38818239 未加载
评论 #38816903 未加载
dang超过 1 年前
Related ongoing thread:<p><i>NY times is asking that all LLMs trained on Times data be destroyed</i> - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38816944">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38816944</a> - Dec 2023 (93 comments)<p>Also:<p><i>NY Times copyright suit wants OpenAI to delete all GPT instances</i> - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38790255">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38790255</a> - Dec 2023 (870 comments)<p><i>NYT sues OpenAI, Microsoft over &#x27;millions of articles&#x27; used to train ChatGPT</i> - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38784194">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38784194</a> - Dec 2023 (84 comments)<p><i>The New York Times is suing OpenAI and Microsoft for copyright infringement</i> - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38781941">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38781941</a> - Dec 2023 (861 comments)<p><i>The Times Sues OpenAI and Microsoft Over A.I.’s Use of Copyrighted Work</i> - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38781863">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38781863</a> - Dec 2023 (11 comments)
kranke155超过 1 年前
The generative AI rollout has taught me what happens when the interests of the many intersect with the destruction of the few.<p>You get steamrolled for defending yourself while you overhear above applause to those who have robbed you of your future.
评论 #38815253 未加载
aimor超过 1 年前
I did an interesting thing and looked at how well the Llama2 models could compress text. For example, I took the first chapter of the first Harry Potter book and recorded the index of the &#x27;correct&#x27; predicted token. The original text, compressed with 7zip (LZMA?) to about 14kB. The Llama2 encoded indexes compressed to less than 1kB. Then, of course, I can send that 1kB file around and decode the original text. (Unless the model behaves differently on different hardware, which it probably does)<p>What I get from this is that Llama2 70B contains 93% of Harry Potter Chapter 1 within it. It&#x27;s not 100% (which would mean no need to share the encoded indices) but it&#x27;s still pretty significant. I want to repeat this with the entire text of some books, the example I picked isn&#x27;t representative because the text is available online on the official website.
评论 #38816720 未加载
评论 #38817882 未加载
评论 #38816752 未加载
评论 #38816938 未加载
评论 #38820812 未加载
beginning_end超过 1 年前
This perspective on regulation was interesting: <a href="https:&#x2F;&#x2F;drafts.interfluidity.com&#x2F;2023&#x2F;12&#x2F;28&#x2F;how-to-regulate-ai&#x2F;index.html" rel="nofollow">https:&#x2F;&#x2F;drafts.interfluidity.com&#x2F;2023&#x2F;12&#x2F;28&#x2F;how-to-regulate-...</a><p><pre><code> &quot;Congress should declare that big-data AI models do not infringe copyright, but are inherently in the public domain. Congress should declare that use of AI tools will be an aggravating rather than mitigating factor in determinations of civil and criminal liability.&quot;</code></pre>
评论 #38814484 未加载
wslh超过 1 年前
While different, I find this discussion about AI and copyrights as an evolution of the war that never was: Google&#x2F;FB converting in the portal&#x2F;proxy for content and while it is not generative AI you can find copyrighted images just using Google Images or as an snippet in the normal search engine. I mention Google because it is the de facto monopoly but this applies to a lot of aggregators.<p>I know we are talking about different technologies but it seems all these people were very silent and find some opportunity in having this war with OpenAI (not an endorsement) but not fighting others.<p>I am not making an statement about the morals of AI and aggregators&#x2F;search engines (super interesting discussion that in a way was happening for long) but I am surprised that organizations are &quot;just&quot; waking up. It seems they just see it is a much simple and cheap fight.
评论 #38817257 未加载
评论 #38817299 未加载
clbrmbr超过 1 年前
Am I the only one believing that copyright has long outlived its usefulness? After all, copyright is not some natural law or mathematical consequence, but rather a social convention that made sense in the era of the printing press.
评论 #38815629 未加载
评论 #38814980 未加载
评论 #38815744 未加载
评论 #38815450 未加载
rmholt超过 1 年前
I feel like the outcome is obvious, there will be a finite list of IPs who&#x27;s owners have enough money to actually sue, which will get filtered out of the output of publicly available models. They will just slap a detector model on the end of the generator to filter them out.<p>Private models will not care, nor will things change for IP owners with lesser power.
评论 #38814986 未加载
评论 #38814992 未加载
CTmystery超过 1 年前
&gt; My guess is that none of this can easily be fixed. Systems like DALL-E and ChatGPT are essentially black boxes. GenAI systems don’t give attribution to source materials because at least as constituted now, they can’t.<p>Is it necessary to fix in the model itself? It seems a gate in the post processing pipeline that checks for copyright infringement could work, provided they can create another model that identifies copyrighted work (solving the problems of AI with more AI :&#x2F;)
评论 #38814667 未加载
评论 #38814490 未加载
评论 #38814609 未加载
AlienRobot超过 1 年前
An argument I&#x27;ve seen made in pro of AI in past threads about this is that &quot;scraping is legal.&quot;<p>Yeah, downloading the content of a webpage may be legal, but redistributing it isn&#x27;t.<p>I wish people stopped trying to make these things seem more important than they really are just because IT people call them &quot;technologies&quot;. Blockchain isn&#x27;t a technology. HTML isn&#x27;t a technology. React isn&#x27;t a technology. And AI is now not a technology.<p>When I see ChatGPT or OpenAI, I don&#x27;t think of &quot;technology&quot;. I think of a program. Software. Because that&#x27;s what it is. You don&#x27;t say &quot;none of the laws that exist in this world apply to this&quot; every time you release new software.<p>I bet many people can&#x27;t tell the difference between a quick answer from Google and a text generated by ChatGPT on Bing. They just see the output.<p>All that amazing capability of generative AI? That got old fast. It was groundbreaking for one instant. Now it&#x27;s just an app that generates images. Just another piece of software. Nothing special about it.<p>Torrenting and other p2p file transfer protocols didn&#x27;t get a pass for inventing groundbreaking ways to break the law. I don&#x27;t think OpenAI will get a pass for doing the same.
评论 #38815203 未加载
davidy123超过 1 年前
The solution could be great. I really don&#x27;t like the way culture always goes to the same tropes, calling any potential innovation &quot;out of Star Trek&quot; (with attendant distorted expectations), right down to expecting an interface based on literal hand-waving in Minority Report. If copyright held works (&quot;USS Enterprise&quot;) could be removed, yet the actual essential concepts (space ship, naming things) retained, it would be a tremendous breakthrough.<p>I think what NYT &amp;c want is for large companies like Apple to pay them for access to their works. This to me is the wrong path, just leading to more silos and walled gardens, special access for the elite.<p>An alternative is base models trained on Wikipedia and public domain (science journals, etc). Foundations could support high quality, well rounded current events reporting. Wikimedia provides a good model for this, with referenced summaries that I don&#x27;t think can be said to reasonably violate copyright. The models would need to be improved to support references, or RAG attribution would have to be widely used when bringing in works that have a current copyright.
评论 #38814799 未加载
评论 #38817793 未加载
dawnim超过 1 年前
This feels like another area where piracy will surely be superior in case things like this land on the disallowed side of regulation. The model trained on all data will outperform the model trained on a legal subset of data. Whether or not you use it to produce potentially infringing content is another point. Performance will likely improve from having references to copyrighted material and people capable of doing so, myself included, would probably prefer to interact with the non limited model. Perhaps time to update the laws or at least move liability from the creator of the model to the user. No one is going after pencil makers but I can draw a pretty good Mickey Mouse with access to one. Feels like me generating C3P0 and claiming ownership is my problem, not OpenAIs.
pointlessone超过 1 年前
If any of those results would be deemed infringing we can bid farewell to all fanart ever. Likewise, to all fanfiction. Or any original work that was merely heavily inspired by previous works. Like a lot of modern fantasy is basically Tolkien fan fiction. Or is Gandalf close enough to Merlin to claim prior art that is in public domain?
评论 #38814987 未加载
评论 #38815419 未加载
评论 #38815535 未加载
Aerroon超过 1 年前
Aren&#x27;t some of the examples basically asking for that content?<p>Ask someone about two Italian brothers in a video game with a red and green hat that have M and L on them. What do you think you would get?<p>If I describe &quot;imagine a comic book duck that swims in a sea of gold in his vault&quot; you would immediately think of Scrooge McDuck, no?
评论 #38814611 未加载
评论 #38814606 未加载
评论 #38814610 未加载
mensetmanusman超过 1 年前
The world is a big place.<p>China can&#x27;t produce LLMs because of inconvenient truths.<p>The US can&#x27;t produce LLMs because of copyright.<p>Decentralized open source LLMs might exist that could work, but they won&#x27;t have the giant GPU clusters.<p>A rich country with lax rule of law wins? Maybe that&#x27;s why Sam went to the Saudis?
评论 #38816551 未加载
jpeter超过 1 年前
If I prompt &quot;golden droid from classic sci-fi movie&quot;, what else am I asking for if not Star Wars?
评论 #38814663 未加载
评论 #38815593 未加载
评论 #38814619 未加载
评论 #38814601 未加载
bambax超过 1 年前
This only mentions ChatGPT (and M$ by association) but how would this impact &quot;open&quot; models? Even if their makers are somehow prevented from updating them, the models themselves are already in the wild...?
redcobra762超过 1 年前
This operates similarly to importing an image into Photoshop. You can do whatever you like with images privately, or with gen AI, but the game ends when you try to use those images commercially.<p>Not sure how this “gets worse” or better for anyone. The current state of things seems generally fine, and there’s a real possibility the courts see it that way too.
评论 #38815367 未加载
评论 #38815342 未加载
continuational超过 1 年前
(Asking Dall-E about the bot image in the article)<p>Me: Who owns the rights to this bot?<p>Dall-E: The character depicted in the images is from the &quot;Star Wars&quot; franchise. The rights to characters and elements from &quot;Star Wars&quot; are owned by Lucasfilm Ltd., which is a subsidiary of The Walt Disney Company.<p>Perhaps it <i>is</i> able to tell, if you ask it?
评论 #38814481 未加载
评论 #38814673 未加载
ponorin超过 1 年前
this is exactly what i predicted: the current generative ai is basically rewarded based on how much it convinces people to be a real thing. it very much has the ability to copy verbatim unlike how most human memories work. without fundamental shift in the methodology of machine learning the fault can only be hidden, not solved. a cat and mouse game where one cat has to fight tens of thousands of mouse. it&#x27;s also very telling how the discussion quickly turns into &quot;maybe society needs to adapt&quot; when so called technological innovation is involved. copyright problem should be solved for artists, not for datacentres. for now it&#x27;s a handful of famous IPs, but what&#x27;s stopping from generative ai to snatch some random indie artist&#x27;s property and copying it ad infinitum?
smrtinsert超过 1 年前
The NYTimes case is a clear one because they are delivering nearly the same content as an end product to users. The others seem like dead ends. The infringer would be the prompter, not the AI which operates more like a search engine. This is Napster all over again, what a phenomenal waste of time and money, where the artist will definitely come out with 0 at the end of it and a few corporations control everything - not to mention, there&#x27;s nothing stopping anyone from releasing a tool that will crawl all spongebobs, generate your model for you and allow you to produce locally copyright infringing material it to your hearts content locally. You could drown yourself in local spongebobs.
DigitallyFidget超过 1 年前
Per United States law, imagery&#x2F;art&#x2F;music&#x2F;text&#x2F;photography generated by non-human means (such as machinery, animals, or generative AI) cannot hold copyright. <a href="https:&#x2F;&#x2F;copyright.gov&#x2F;comp3&#x2F;chap300&#x2F;ch300-copyrightable-authorship.pdf" rel="nofollow">https:&#x2F;&#x2F;copyright.gov&#x2F;comp3&#x2F;chap300&#x2F;ch300-copyrightable-auth...</a> Section 306 on page 7.<p>I&#x27;m not sure how it&#x27;ll hold up in law to claim copyright violations against something that wasn&#x27;t created by a person. It&#x27;ll really depend on the lawyers and judge&#x27;s interpretation of written law. But I&#x27;m curious to see what comes of this.
评论 #38816548 未加载
评论 #38817751 未加载
1shooner超过 1 年前
Imagine a future where copyright registration involves contributing your IP to a public adversarial model, which is then a regulated layer in future generative model licensing.
Hugsun超过 1 年前
There are good arguments for the copyright infringement belonging to the user, not the model maker, in this thread.<p>One issue with that is that there is not a reliable way to determine if copyright is being infringed.<p>Even if models could be used responsibly, there might not be a reasonable expectation that most people will. If infringement is so easy and avoiding it relatively hard.<p>I&#x27;m not sure what legal prescriptions should be made on this basis, but it&#x27;s an interesting thought.
评论 #38818146 未加载
golol超过 1 年前
How about this: Image generators should be treated like random google image search. They sample randomly from the distribution of publicly viewable images. Google does it exactly while Image generators do it in an interpolative way. Google images produced copyrighted works most of the time, an image generator only sometimes. Neither should be liable if someone sells a copyrighted work that was produced to someone else.
评论 #38815653 未加载
shkkmo超过 1 年前
It seems like this article makes a basic copyright mistake. I don&#x27;t see any evidence that these are &quot; reproductions&quot; of source material like since no source image is linked to compare.<p>Instead, these are derivative works. We already have a flourishing culter of derivitave works, such as fan art that exist in various shades of legal greyness.<p>Some derivative works are fair use, some are not.<p>The position of the Author here seems to be that generative AI should not be capable of creating any derivitave works, or should only be able to do so it it can accurately identify which are fair use and which aren&#x27;t (which seems like an impossibly tall bar.) This stance seem like a giant attack on fair use that significantly expands the power of copyright.<p>To me, the takeaway from this is different. This makes clear that there is currently a risk when using AI generated art that you could end up unintentionally creating and publishing a derivative work unintentionally and thus without evaluating if that work constitues fair use.
qgin超过 1 年前
Things are about to get a lot worse for generative AI <i>in the United States</i><p>They are about to be infinitely better for generative AI in China.
评论 #38819348 未加载
karmakaze超过 1 年前
It shouldn&#x27;t matter how the images&#x2F;etc are created. The problem comes about when it&#x27;s used as an original work by the person that&#x27;s doing so.<p>Imagine instead of AI&#x2F;ML, we have a mechanical-turk-like service that produces output from descriptions. The service makes no claims that the generated outputs are not similar to any copyrighted works. The only claim the service makes is that they themselves claim no copyright on the output. It&#x27;s then up to the user of the service to determine if the output is suitable for their intended use.<p>Whether such a service itself is legal is a separate matter. For that matter, say you outsourced the artwork to a person who again gave you infringing work. The user of that output is still in violation. With AI&#x2F;ML we&#x27;re basically outsourcing to a &#x27;service&#x27; that is known to sometimes output copyrighted work so with the user knowing that, are responsible for fair usage.
docdeek超过 1 年前
How is this different to Googling “robot cop” or “video game plumber” and being served copyrighted material?<p>Is it because Google will link to the image source? Or does the infringement begin when I use the image for gain, or claim it as my own? Perhaps it is because Google was allowed to crawl the page with the original image, so presenting them with a link is fine?
评论 #38814908 未加载
评论 #38814758 未加载
评论 #38814729 未加载
legendofbrando超过 1 年前
Surely one answer is to train (or aggressively fine-tune) a new model that doesn’t (or refuses) to produce these outputs and then - as exists already, augment that model’s understanding of copyrighted material by having it Bing&#x2F;Google search as a RAG process that requires the end user to log into accounts at the New York Times (and other accounts) with their paid sub. This broadly replicates the process a person could do today when they read the internet and summarize it while paying rights holders.<p>Expensive to do but hardly the end of Generative AI or OpenAI should that be the difference between having a business or being sued out of existence. Never underestimate people who have a clear economic interest especially when their own existence is at stake.
sjducb超过 1 年前
I think it’s a question of what counts as publication.<p>I think that an AI model is analogous to an employee. Imagine I ask my employee to write an article, and they just copy an existing one from the times. That’s plagiarism and bad work, not copyright infringement.<p>If I then decide to publish the plagiarised article, then I have committed copyright infringement.<p>I once ran into this exact problem with a human. I hired a designer to make some artwork for an app. When I launched the app it turned out that the human had just copied the artwork from another game. It’s my problem that I hired an idiot, and my problem that my app was infringing the copyright of another app. (We redesigned the graphics very quickly)
jlnthws超过 1 年前
We could get inspiration from the case of the record industry against Napster, or cabs VS Uber. Both parties are somehow abusing their position, but the world is moving on. Rent seeking is probably not an absolute source of wealth after all.
评论 #38818257 未加载
null_point超过 1 年前
I suspect this may delay some short term progress by creating pressure on AI labs to train their models from data curated or synthesized in a way that is contentious of copyright law.<p>There is already troves of data that are fair game for training, but even &quot;corrupted&quot; data sets can probably be used if used intelligently. We&#x27;ve already seen examples of new models effectively being trained off of GPT-4. That approach with filters for copyrighted material might allow for data that is sufficiently &quot;scrambled&quot;. Not to say building such a filter is definitely easy, but seems plausible.
KETpXDDzR超过 1 年前
I&#x27;d expect &quot;Open&quot;AI et al to lobby heavily towards an &quot;AI-generated content is excluded from copyright infringement&quot;. I think it&#x27;s possible that they&#x27;ll introduce a &quot;generative AI&quot; tax. Charge x cents per generated text&#x2F;image and distribute the fund to all media companies.<p>In Germany you pay some amount extra on top of the sales price of anything that can store data (CX, DVD, USB sticks, HDDs, ...). This is then distributed to all companies that could be impacted by software piracy. I&#x27;m still not sure if that&#x27;s legal considering the Geneva convention disallows collective punishment.
airesearcher超过 1 年前
I think there is another way to solve this. Someone should train an LLM on copyrighted images. Then use that as a second pass on any image generated by the primary LLM to check if it might contain copyrighted images, and blur the copyrighted parts(or change them sufficiently).<p>Another change could be to the license agreement of LLMs - they could have the user assume liability for any material produced instead of the provider assuming liability. The user would agree that getting the rights for any copies and distribution of copyrighted materials is their sole responsibility instead of the provider.
8note超过 1 年前
&quot;from classic sci-fi movie&quot;<p>How could you put that as the prompt without intending to infringe? Anything pulled from a classic sci-fi movie would be infringement. The term droid is also star wars specific?<p>Id consider the &quot;red soda&quot; one as grounds that the Coca-Cola brand has become generic and that it&#x27;s synonymous with soda. Same thing with Mario too. There is so much non-nintendo content made featuring Mario the plumber that you could get that without training directly on Nintendo&#x27;s artwork
wouldbecouldbe超过 1 年前
What about non-mit source code, 100% it&#x27;s trained on those as well.
asylteltine超过 1 年前
I certainly hope so. You can’t just steal content and call it “””AI”””
josh-sematic超过 1 年前
Gary Marcus is growing his subscriber base using images of copyrighted IP (C3PO, Mario, etc.). Fair use? Then why is the tool he used to produce those materials not also fair use of the IP? My take is that either we say the models are like people (do we penalize people for learning from IP and letting that influence what they subsequently produce?) or we say they are like tools (do we penalize Adobe because Photoshop makes it easier to make a picture of Mario on the Death Star?).
评论 #38815363 未加载
ur-whale超过 1 年前
It&#x27;s not for generative AI that thing are about to get a lot worse.<p>It is in fact the very notion of Copyright is breathing its last breath, and it is fantastic to be alive to see it happen.
dmbche超过 1 年前
Hey so the problem isn&#x27;t the output of the LLMs but the input - the data they are trained on is stolen (big suprise, you can&#x27;t claim fair use when using something commercially, like training your LLM).<p>The output is irrelevant.<p>Edit1: If you want to verify this, check out all the lawsuits against AI companies : it&#x27;s always about using their copywritten goods. Any discussion about the output is to talk about the amount of damage done to the copyright holder, not if damage exists or not.
评论 #38819803 未加载
roenxi超过 1 年前
Based on the rate of progress; I think this makes little difference to AI progress in the medium-long term.<p>At the moment, we don&#x27;t have hardware that can do what humans do (process video feed from eyeballs and build up a world model). I imagine that we&#x27;ll cross that barrier cheaply in the coming decades, at which point copyright becomes moot. AIs will be able to develop their own styles and world understanding from scratch, then generate original work.
Paradigma11超过 1 年前
So, whats the plan?<p>Content creators&#x2F;artists compete globally. The only thing harsh regulations will do is create an unlevel playing field where artists from noncaring countries will have big advantages over artists from the west, which will be driven into illegality to compete.<p>In the end products will have to be classified anyway if they are infringing on copyright and&#x2F;or were being built by an LLM. Most likely automated by another LLM.
评论 #38815168 未加载
nojs超过 1 年前
In practice, what happens next when websites all start to block openai by default (or change their TOS to disallow OpenAI’s crawlers)?<p>It seems like there’s little incentive not to do this, because unlike Google OpenAI isn’t bringing any traffic or eyeballs. It may end up being a default setting in Wordpress for example.<p>But OpenAI presumably can’t afford to pay every single long tail source of content on the whole internet — so how does this end?
评论 #38814878 未加载
评论 #38814791 未加载
评论 #38814941 未加载
zarzavat超过 1 年前
This for me does not make sense as a copyright violation. It’s like saying that Adobe is in trouble because you drew something infringing in Photoshop. If you prompt the model with the intention of creating something infringing by mentioning the name of the characters and the work, and you get something infringing out, then it’s <i>you</i> who have infringed the copyright, not the maker of the tool.
评论 #38814574 未加载
评论 #38814607 未加载
评论 #38814582 未加载
评论 #38815384 未加载
评论 #38814597 未加载
评论 #38814588 未加载
评论 #38814573 未加载
digitcatphd超过 1 年前
Rather than attempting to combat our obvious future, they should spend this effort to find ways to monetize and succeed in this new environment.
hahajk超过 1 年前
&gt; And a whole universe of potential trademark infringements with this single two-word prompt: animated toys<p>If you flood the market and dominate children&#x27;s culture with toys from your TV shows, you absolutely cannot complain when your toys are considered iconic enough to be the generic &quot;animated toy&quot;. These images don&#x27;t replace or substitute the things they are depicting.
karmakaze超过 1 年前
The real &#x27;problem&#x27; is how do we navigate the present and near future where much more than physical labor is being automated? This is where we need sustainable solutions. The rough road on the way should also be smoothed out so as not to disrupt so many lives, but it&#x27;s good to keep a perspective what and why we&#x27;re doing these things.
SubiculumCode超过 1 年前
Attribution weights could be the basis of new type of copyright asset licensing scheme. For all those tech employees who fed the company&#x27;s model, a license in perpetuity to at least a portion of that value...but only if you fight for it. They are training to replace you, watching your every move, your thought processes, ready to make you a function call.
efields超过 1 年前
It’s more interesting to me how these entities that operate the models start making money from them. They are a money pit and there’s not enough $20&#x2F;month subscribers on earth to support them.<p>Enterprises that make content with this also don’t want to infringe on copyright. The AI companies don’t have a good story here. The value has not become evident after years.
tim333超过 1 年前
They are just going to have to inform the AI in some sense of the current copyright situation and ask it not to infringe.<p>It&#x27;s the same for human writers. If you are writing an article for Wikipedia say, you should read relevant source articles and then rewrite in a way that isn&#x27;t a copy and paste beyond a few words.
评论 #38819154 未加载
_giorgio_超过 1 年前
This guy built a career around nonsensical and catastrophic endings.<p>Everything that he sees has mysterious flaws that never happen.
intrasight超过 1 年前
Just make LLMs be like your average human and forget details. I know that it&#x27;s easier to say than to do, but so are many things worth doing. I can&#x27;t plagiarize - my language and visual memory doesn&#x27;t work that way. Such an LLM will have to &quot;create&quot; and answer from more fuzzy memory.
评论 #38815078 未加载
caeril超过 1 年前
Wow. I feel really sorry for these giant corporations who have wielded armies of lawyers against fanfic artists to prevent fair use, and to prevent trademarks and patents from expiring on the timelines enshrined by law.<p>Can we all have a moment of silence for poor Bob Iger? Maybe we can start a GoFundMe to help him out?
rolisz超过 1 年前
Simple fix (at least for ChatGPT): ask it to avoid drawings with similarities to copyrighted characters.
t_mann超过 1 年前
The article kind of amplified my regrets&#x2F;anxiety for not getting a copy of books3 and the likes while it was easy. I didn&#x27;t have an immediate use case, and I don&#x27;t now, thought I&#x27;d wait until actually need it, but it feels like a window is closing here.
评论 #38814733 未加载
logicchains超过 1 年前
I predict this could be a boon for generative AI because restricting it to training on copyright-expired media would produce a higher quality training corpus, as low-quality material from so long ago is unlikely to have been preserved, leaving only higher-quality material.
vimax超过 1 年前
Maybe Disney and the record labels shouldn&#x27;t be claiming so much of public culture as their own.
评论 #38814921 未加载
Alifatisk超过 1 年前
Did ClosedAi (OpenAi) ever confirm or deny that they trained their models on copyrighted materials?
评论 #38815122 未加载
评论 #38819447 未加载
goertzen超过 1 年前
No they are not.<p>This is a negotiation tactic by the NYT to drive up the licensing price. Period.<p>The Napster&#x2F;Music Industry analogy has no resemblance to this situation.<p>The only meaningful question that might be answered as a result of this is, what permission and access rights do crawlers have to content that is publicly and legally available.
评论 #38817578 未加载
评论 #38817631 未加载
quonn超过 1 年前
Maybe the way to go is to do pre-training on copyrighted data, then to thoroughly shake things up so that hopefully only some useful abstract structure of world knowledge remains and then train that on carefully selected licensed data.
评论 #38814819 未加载
airstrike超过 1 年前
I have no patriotic skin in the game, being neither American, nor European, nor Chinese, but this copyright issue seems overblown to me and like the perfect way to hand the leadership in generative AI over to China
评论 #38817041 未加载
ultrablack超过 1 年前
We are all trained on copyrighted input. That is not a problem. What is a problem is if you reproduce it and try to claim copyright for that. If someone wants to create their own image of Mario in an AI, so what?
评论 #38816013 未加载
amai超过 1 年前
Should the NYT not sue <a href="https:&#x2F;&#x2F;commoncrawl.org&#x2F;" rel="nofollow">https:&#x2F;&#x2F;commoncrawl.org&#x2F;</a> ? OpenAI just used the data from commoncrawl for training.
评论 #38819340 未加载
Avicebron超过 1 年前
I&#x27;m surprised this is presented as a revelation? I did pretty much this same experiment ages ago as part of a suite of tests comparing the efficacy of different sized models..
renewiltord超过 1 年前
You can try, but I have Mistral on my local computer and it doesn&#x27;t need the Internet. And people have pirate dumps they&#x27;re going to run this stuff through.<p>I&#x27;ll just do it myself.
amelius超过 1 年前
Just like we have the uncanny valley for robots, LLMs are in the unoriginality valley. Only when we get out of it will the copyright issues go away.
smitty1e超过 1 年前
The DALL-E&#x2F;*GPT revolution sounds like the death of personal and corporate property.<p>That&#x27;s gonna leave a Marx[1].<p>[1] <a href="https:&#x2F;&#x2F;youtu.be&#x2F;7WDKivqFOgA?si=nWq5aeKA4dLytX3Z" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;7WDKivqFOgA?si=nWq5aeKA4dLytX3Z</a>
ofslidingfeet超过 1 年前
I&#x27;m still waiting for people to figure out the whole point of an automated process is that it behaves the same way each time.
penjelly超过 1 年前
&gt; My guess is that none of this can easily be fixed.<p>also my concern, except it feels like many of LLMs &quot;problems&quot; cant be easily fixed
zanfr超过 1 年前
no matter how you look at it; the cat is out of the bag. OpenAI could be censored but you can&#x27;t censor the opensource
Log_out_超过 1 年前
That sound, as if layers and layers of renteering aristocracy were forced to work again against their will.
AC_8675309超过 1 年前
So the models overfit the training data, essentially memorizing, instead of generalizing?
wayeq超过 1 年前
We need to figure out how to ever so gradually move toward a post-copyright economy.
SKILNER超过 1 年前
I don&#x27;t understand the glee so many people have over this. I love being able to use Generative AI tools. How is it different than if I asked a person to draw these pictures for me? I know someone will gleefully clobber this question with a legal answer, but God, let&#x27;s move forward, hunh?
评论 #38817668 未加载
throwuwu超过 1 年前
Copyright is fucked. Even if Open AI somehow loses this and has to delete GPT4 and their training data, the generative AI cat is so far out of the bag that it’s gone on to live a full life and have many grandkittens. It’s already easy to install and run generative models and it’s just going to get easier and the models will keep getting better. These lawsuits are futile and won’t matter in 2 years or less.
RecycledEle超过 1 年前
If we get rid of unconstitutional copyrights in the US, this ges away.<p>Recall that according to the US Constitution, copyright can only be on on &quot;science and the useful arts.&quot;<p>Alternately, we could restore a reasonable limit to the duration of copyrights, like 14 years.
pxoe超过 1 年前
there&#x27;s an easy fix. the easiest. just don&#x27;t use data that you don&#x27;t have the rights to use. apparently that&#x27;s just impossible.<p>&quot;but what if we want to scrape the entire web and something makes it in anyway? see, that is impossible&quot;. well that&#x27;s just saying &quot;fuck it&quot; and using bad data anyway. that&#x27;s not an actual effort to &quot;not use data you can&#x27;t use&quot; - there was just no way there&#x27;d be a &#x27;rights cleared&#x27; way to use the entire web anyway. that is impossible. using a clean dataset is not impossible. it&#x27;s very possible.
RandomGerm4n超过 1 年前
Perhaps we should simply take this as an opportunity to finally abolish copyright. Smaller artists mainly earn their money with commissions. They are paid to do a very specific thing. Whether there is a copyright on the result is irrelevant. Someone else who would &quot;steal&quot; the image and use it without payment would apparently have fewer requirements. The person could have simply taken any AI image. Therefore, the artist in the scenario would not receive any money from the second person anyway.<p>Apart from this, it is mainly large companies that benefit from copyright laws. Why should we have laws that restrict progress just so large capitalist companies can maximize their profits?
评论 #38814906 未加载
skybrian超过 1 年前
I wonder what Adobe Firefly does with these prompts?
oglop超过 1 年前
So what? I feel like I’m taking crazy pills when I read these things. You all do realize the same thing happens in your mind with those same prompts right? That’s kinda how it works. Who is surprised by this? Yeah no shit it can kinda reproduce the text it was trained on, so do I! That’s how that works. And the NYT knew for a long ass time this thing was ingesting. Literally saw this in the marketing when I signed up last year.<p>I wasn’t shocked when I noticed I could query it about ANY math textbook I owned and it could talk with me about it. I did t bitch and gripe, I enjoyed it and have conversations.<p>Anyway, I’m in the minority I guess. I love that I can talk with it about books and news.
freddealmeida超过 1 年前
not in japan.
Joel_Mckay超过 1 年前
If ML cannot create copyrightable or patented material under current legal precedent, than shouldn&#x27;t the prompt output be considered public domain regardless of content semblance?<p>The paradox should still violate Trademarks due to similarity, but likely cannot infringe on copyright content under prior legal opinion... if at least 80% different from prior art. The lawyers are likely going to have to do a special firm survey to figure this one out.<p>Bag of popcorn ready =)
yieldcrv超过 1 年前
a lot worse for cloud providers hosting generative AI<p>the models can be fine
gfodor超过 1 年前
Gary Marcus is the master of AI FUD
octacat超过 1 年前
I am expecting politicians would do some nice mental gymnastics regarding regulating this. All major IT companies are doing genai now and nobody wanna hurt the companies.
Intox超过 1 年前
Or... things are about to get worse for copyright holders.<p>I don&#x27;t see any developped country pressing the brake on AGI in the near future to protect a few copyright holders from getting &quot;stolen&quot; in hypothetic scenarios.
评论 #38815726 未加载
评论 #38814848 未加载
评论 #38814922 未加载
评论 #38814564 未加载
评论 #38815063 未加载
评论 #38815914 未加载
评论 #38816988 未加载
评论 #38814757 未加载
评论 #38814545 未加载
评论 #38817232 未加载
评论 #38815051 未加载
评论 #38814841 未加载
评论 #38814596 未加载
评论 #38815399 未加载
评论 #38816394 未加载
评论 #38815570 未加载
评论 #38816314 未加载
评论 #38816621 未加载
评论 #38817052 未加载
评论 #38814726 未加载
评论 #38815400 未加载
评论 #38817098 未加载
评论 #38816234 未加载
Baldbvrhunter超过 1 年前
I imagine the argument might be like this:<p>I hire a session musician to play on my new single, paying him $100. I record the whole session.<p>I ask him to play the opening to &quot;Stairway to Heaven&quot; and he does so.<p>&quot;Well, I can&#x27;t use that as a sample without paying&quot;<p>&quot;Ok play something like Jimmy Page&quot;<p>&quot;Hmm, still sounds like Stairway to Heaven&quot;<p>&quot;Ok, try and sound less like Stairway to Heaven but in that style&quot;<p>&quot;Great, I&#x27;ll use that one&quot;<p>and I release my song and get $5,000 in royalties.<p>Should I be sued for infringement, or the guitarist?<p>The problem, I suppose, is that if I had said &quot;play something like 70s prog rock&quot; and he played &quot;Stairway to Heaven&quot; and I didn&#x27;t know what it was and said &quot;great, I&#x27;ll use that&quot;.<p>Should I be sued for infringement, or the guitarist?
评论 #38814460 未加载
评论 #38815014 未加载
评论 #38814459 未加载
评论 #38814446 未加载
评论 #38814523 未加载
评论 #38814771 未加载
评论 #38823805 未加载
评论 #38815318 未加载
评论 #38814600 未加载
iainctduncan超过 1 年前
I am constantly suprised by the amount of apologizing for generative AI infringement here. The fact that it&#x27;s already being done and is a technical breakthrough is irrelevant to <i>existing</i> copyright law. &quot;We are big and innovative&quot; may hold weight with legislators, but it won&#x27;t with the courts.<p>Remember when everyone and their dog discovered sampling in the late 80&#x27;s and they all thought they could get away with it because it didn&#x27;t seem like infringement to the samplers? The courts had no qualms about slapping record labels for putting out records with unlicensed samples in them. Albums even got pulled off shelves while licenses were sorted out.<p>These companies are charging for a service that returns copyrighted content, full stop. You can&#x27;t do that whether you are AI or someone drawing Mario and selling the pictures on iStock, or putting out records that sample someone else&#x27;s work without permission. It took a while in the case of sampling, but it sure as hell happened.
评论 #38816931 未加载
sjfjsjdjwvwvc超过 1 年前
Please ban all these AI companies, at this point I have enough OSS models, don’t really need any hosted service anymore.<p>IMO would be best if this stays a highly illegal technology that is only available to a few weirdo nerds &#x2F;s
jdjdjdkdksmdnd超过 1 年前
people are so naive. AI is a matter of national security now. its over. they exposed civilians to nuclear radiation for the nuclear bomb. and you think the state would let this get in the way of the AI arms race which they are anxiously anticipating? nope
whodidntante超过 1 年前
Simple solution, when gpt-5 comes out, just rename it Claudine, and the NYT will drop their suit