TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

AI tools like ChatGPT are built on mass copyright infringement

54 点作者 sherilm将近 2 年前

23 条评论

Animats将近 2 年前
Probably not. Humans are trained by reading books and writing something similar, but not identical. Unless what comes out looks a lot like a specific thing that went in, it&#x27;s probably not copyright infringement in the US.<p>Artists making copies of famous pictures is a standard part of artist training. Tribute bands are a thing. Elvis impersonators are a mini-industry. The lawsuits so far tend to involve &quot;passing off&quot; new work as someone else&#x27;s work, as with that ML-generated popular song passed off as a new work by a major artist. That&#x27;s not a copyright issue. That&#x27;s ordinary fraud.<p>The bitching from writers and musicians reflects not that they&#x27;re being ripped off, but that they&#x27;re being out-produced. Authors didn&#x27;t expect to be in the position of John Henry vs. the steam hammer. Now they are.
评论 #36089684 未加载
评论 #36089832 未加载
评论 #36089730 未加载
koboll将近 2 年前
Copyright infringement is when you take copyrighted work and distribute it directly, or so close to directly that it can&#x27;t be said to be &quot;transformative&quot;.<p>Obviously LLM outputs are transformative, so this argument falls completely flat. As the writer is a copyright lawyer, it&#x27;s hard to conclude anything other than they are knowingly lying, or at minimum wishcasting what they want the law to say instead of what it does say.<p>I think the misconception stems from the laymen understanding of copyright clipping off the last part of that sentence so it&#x27;s just &quot;Copyright infringement is when you take copyrighted work&quot;.<p>Proof of the success of industry campaigns to vilify things like taping broadcast television.
评论 #36089546 未加载
评论 #36089956 未加载
评论 #36090028 未加载
评论 #36089857 未加载
评论 #36089565 未加载
dragonwriter将近 2 年前
In the US, Copyright is not viewed as a natural property right but a contingent property interest for the limited purpose of promoting “the progress of science and useful arts”.<p>Pointing out that a major advancement would be foreclosed by a proposed interpretation of the scope of copyright law is an argument that that interpretation is either an incorrect interpretation of the statute or an area where the statute conflicts with its Constitutional purpose.<p>Of course, the Globe and Mail is Canadian, but, is Canadian law the applicable governing law here?
评论 #36090040 未加载
version_five将近 2 年前
The time for copyright has long passed anyway. It&#x27;s not clear to me that any infringement actually takes place in training AI, and I think we&#x27;ve all seen the arguments. But even if it did, it just means that &quot;infringement&quot; is a stupid construct that needs to be done away with. Intellectual property as a whole is a failed experiment that doesn&#x27;t actually spur creation, which the the argued point for why the rest of us give up natural rights.<p>FWIW, the author appears to be the founder or former founder of Licensy - you can guess what it&#x27;s about, so I wouldn&#x27;t take the article as a neutral legal opinion anyway.
评论 #36089603 未加载
DennisP将近 2 年前
If it weren&#x27;t for our dumb copyright laws, we could train AI on all the world&#x27;s books and scientific papers instead of whatever&#x27;s available online, and take the next great leap in human progress.
评论 #36089537 未加载
评论 #36089648 未加载
评论 #36089881 未加载
评论 #36089478 未加载
thomastjeffery将近 2 年前
There are two ways to end up with a similar artwork:<p>1. Create enough art in the same restricted domain, and some works will <i>happen to</i> look similar to others.<p>2. Use aspects of a work in the process of creating another. Without enough process, the result will resemble the original.<p>LLMs pretend to do #1: They first model a collection of works - called the training corpus - to implicitly generate a domain. Everything that is &quot;hallucinated&quot; by that model is restricted to that domain.<p>The entire goal of an LLM is to resemble the training corpus. The only interesting part is &quot;in what way&quot; that resemblance appears.<p>For example, an infamous paper illustrated that after being trained against valid Othello games (written as sequential piece positions), GPT can hallucinate a valid Othello game move. Does this mean that the game moves are novel art? Of course not! They are the <i>result</i> of the training corpus being modeled through a transformer. This is method #2 masquerading as method #1.
smoldesu将近 2 年前
Lukewarm take: Copyright is based on mass freedom infringement. If you publish something, you should be prepared for others to reproduce it and take it without credit. This is particularly true on mediums like the internet, where the net cost of copying data is effectively nothing.<p>I&#x27;ll entertain arguments from either side of the infringement debate, but this opinion piece reads more like a reactionary defense of copyright rather than a measured response. These damages are... not that significant. It feels like Big Tech swooping in to defend Open Source not because they care about freedom, but because without it they would have no software. But instead of copyleft, it&#x27;s copyright. Hmm.
评论 #36089908 未加载
评论 #36089601 未加载
nokya将近 2 年前
Facebook was exactly built on the same premise: stealing massive databases from others (i.e., women profiles and photos hosted in student associations and sororities websites).<p>Look how justice responded when MZ was sued: first, by his university, then by the justice system. None did anything.<p>I&#x27;ll let you just imagine what would have happened if MZ was of another ethnicity. Well, not only ethnicity but the other factor is a taboo even here.
fooker将近 2 年前
If it&#x27;s a useful technology, no amount of lawyering will stop it.<p>If you don&#x27;t have it, your competitors will, and your people will use the competitors software anyway.
MWil将近 2 年前
seems more like AI tools should be (if they&#x27;re not) built on the [insert best argument here for fair use]. then you&#x27;ll know you&#x27;re safe.
guardiangod将近 2 年前
What distinguishes a transformation function like a human (20+ years of education with different input-&gt;output) vs a transformation function of a computer? (20k hours of training-&gt;output)<p>What about musician training? Movie directors?
评论 #36089898 未加载
评论 #36090144 未加载
gmuslera将近 2 年前
Probably the biggest copyright infringement is related to open licenses. Is the kind of code easier to scrape, but which licenses have some restrictions on how it can be used, like including the license file or that the code that you based on it must have the same freedoms as the original code (I.e. not using it in closed source commercial programs).<p>Probably the same goes for open content licenses.
dataviz1000将近 2 年前
They output the most likely token (unless I&#x27;m mistaking) which means tools like ChatGPT are the ultimate prior art machines. They answer what is the most parallel amongst similar sources.For example, when I ask ChatGPT to build a state machine, the state machine it attempts to build is the sum of all prior art not any one specific copyrighted machine.
oh_sigh将近 2 年前
I find it somewhat problematic that the author does not disclose that she built her career not just around intellectual property, but specifically the business of licensing content.<p>On one hand, that makes her an expert in the legal aspects of this case. On the other hand, generated content presents a substantial threat to her current income stream.
评论 #36089557 未加载
al2o3cr将近 2 年前
Cynical prediction: this will end up with copyright laws for words even dumber than the ones for music, where putting three notes in sequence can become EXCLUSIVE CORPORATE PROPERTY if you have enough lawyers.
Exuma将近 2 年前
Now here&#x27;s an opinion I don&#x27;t care about.
kfarr将近 2 年前
YouTube built an audience on mass copyright infringement over a decade ago, I think the cat is out of the bag.
cft将近 2 年前
&quot;As a former copyright startup founder &quot; did he mean he&#x27;s an aspiring patent troll?
评论 #36089675 未加载
nobodyandproud将近 2 年前
If so, this proves that copyright as we know it today hinders progress of the arts and sciences.
headcanon将近 2 年前
The purpose of copyright is to incentivize the creation of art by fairly compensating the creators. If creators cannot be compensated fairly, then artists cannot devote their full efforts to being creative, and thus art and innovation stagnates.<p>The problem is that under current copyright law, having to purchase a distribution license for everything an AI learns in its training data would be cost prohibitive, to say the least, which also stagnates art and innovation.<p>I imagine a future where much of human effort is ultimately directed towards creating ground-truth data for a generative AI, whether that be text, pictures, art, or other media. I have a writer friend for whom this is happening already. We need to somehow incentivize ground-truth collection, and we don&#x27;t have a way to do that right now apart from wages, which IMO is not sufficient.<p>To me, the question is not whether or not AI violates copyright, because we&#x27;re clearly in new territory here. The question is, how do we properly compensate folks who are creating that ground-truth data? Do we need to restructure our economy from the ground up, or is there a path from the current capitalist economic model to doing this?
lvl102将近 2 年前
This is why all of this AI stuff needs to be open source and in public domain.
perrohunter将近 2 年前
There&#x27;s no reproduction, it&#x27;s just reading the text.
riskneutral将近 2 年前
Time for copyright laws to be rewritten then.