TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Authors say OpenAI 'ingested' their books to train ChatGPT

46 点作者 bratao将近 2 年前

12 条评论

hn_throwaway_99将近 2 年前
It&#x27;s really clear to me that a new legal framework will be required to deal with the societal consequences of advanced AI. For example, if this article were instead about, say, an actual human who read all of these books, and then posted reviews and some summaries online, and then this human got sued by the authors, I think every single comment here would be decrying this as an abuse of copyright law, and that this should fall squarely under fair use.<p>The thing that is &quot;scarier&quot; if you will about what AI can do is the sheer speed and breadth that it is capable of. It really is the <i>scale</i> that changes how people feel about these technologies, and that requires new legal frameworks in my opinion because I think people really feel what is &quot;fair use&quot; is different if it comes from a person vs. a machine.<p>Similar analogy: before the Internet, pretty much everyone agreed you didn&#x27;t have an expectation of privacy if you were walking around outdoors. But there is a marked difference in thinking &quot;Yeah, I expect other people walking around may see me, or even take a picture of me&quot; compared to &quot;I think someone should be able to take a picture of me and put it on the Internet with my accurate geolocation so the entire planet can look it up, for all time.&quot;
评论 #36662343 未加载
constantcrying将近 2 年前
&gt;They argue that ChatGPT&#x27;s ability to produce detailed summaries of their works indicates their books were included in datasets used to train the technology.<p>This just isn&#x27;t true. The authors have detailed summaries of their books on Wikipedia, that claim seems unsustainable.<p>There are actually interesting legal questions about AI, but this case seems not that interesting. I don&#x27;t even see how the authors would demonstrate that <i>their</i> copyrighted works were used and not some summary, sourced from elsewhere.
评论 #36664086 未加载
b33j0r将近 2 年前
Ok. I may as well come clean. For the past 36 years, I have been ingesting all of the information available to train a model called b33-j0r<p>I didn’t mean to read your books, they were mostly trash. I’m so sorry for reading things you published!
评论 #36663077 未加载
评论 #36664237 未加载
评论 #36665538 未加载
musicale将近 2 年前
&gt; They argue that ChatGPT&#x27;s ability to produce detailed summaries of their works indicates their books were included in datasets used to train the technology.<p>If ChatGPT can reliably answer &quot;what happens on page (x) of book (y)?&quot; that would provide fairly convincing evidence, as summaries or study notes on a random book are unlikely to be that detailed. Moreover, if this approach worked consistently, it would enable the entire book to be summarized - or even rewritten - page by page.
评论 #36661349 未加载
robbie-c将近 2 年前
There&#x27;s a lot of discussion about whether existing copyright laws apply to training data, but not enough discussion of what should change in copyright law to give the best outcome to society.<p>I don&#x27;t think treating ML systems as legally equivalent to a human brain is right, but I also don&#x27;t think that copyright law is sufficient as-is. This is something entirely new.<p>It seems like society would benefit from have ML systems around that can be trained on copyrighted material, but not if it prevents authors from being able to make a career out of producing great work. We need to balance the outcome for owners of ML companies, authors, and society as a whole, ideally prioritising the latter.<p>What do you think we should do?
评论 #36664124 未加载
评论 #36664967 未加载
评论 #36664962 未加载
WiSaGaN将近 2 年前
Fundamentally this is about wealth distribution. The society do want those data to be used for AI. It&#x27;s just the data providers should be compensated, instead of OpenAI just gatekeeping their product that critically depend on those data. The original model of copyright just doesn&#x27;t work anymore in this new era.<p>Music streaming has changed the music industry. Can we expect something similar happening in AI?
aussieguy1234将近 2 年前
Well, to become an expert in any topic, humans have to go to university and read textbooks as a part of the course. Do these humans owe anything to the authors, other than the purchase price of the books? Why should it be any different for AI LLMs&#x2F;reasoning engines?
评论 #36661462 未加载
评论 #36661674 未加载
评论 #36661551 未加载
评论 #36661463 未加载
ChildOfChaos将近 2 年前
What are we going to class these AI&#x27;s as though?<p>If I read a book and then tell someone else about it, that isn&#x27;t copyright infringement, if the AI &#x27;reads&#x27; the book and tells someone else about it, they are claiming it is?
评论 #36665595 未加载
评论 #36665000 未加载
freddealmeida将近 2 年前
Japan has allowed us to do this legally. Going to be interesting over here.
barrysteve将近 2 年前
Same old Murican business model. Bill Gates did it, Zuck did it, Jeff did it, Elon does it. Embrace Extend Extinguish, Facebook forced tracking into the web and made profiles for people who didn&#x27;t sign up, Google absorbed everything, ect.<p>The American business model has been Robber Barons. Since the 1800s, at least. OpenAI is the latest. The reality is, you can never give anything valuable to the &#x27;free market&#x27; and an internet connected device, because you&#x27;re never going to be able to govern it according to your values.
评论 #36665009 未加载
musicale将近 2 年前
And digested them as well. We are currently in the post-digestion stage.
RecycledEle将近 2 年前
I could produce similar summaries based on the summaries of these books and discussions of them that are posted online. I think the easiest defense is to show that all the information shown in the plaintiffs&#x27; exhibits is also freely available online.