TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Fruit of the Poisonous Llama? (2023)

32 点作者 edent3 个月前

11 条评论

holowoodman3 个月前
LLMs and Code assistants always were primarily a means to copyright-wash all available things so they could be freely incorporated into commercial products without the problems of having literal copies of stuff you shouldn&#x27;t have in your code.<p>Of course this has to backfire.<p>However, I&#x27;m of the opinion that this is a good thing. Copyright is a sham and needs to be abolished.
评论 #43011288 未加载
评论 #43013641 未加载
jll293 个月前
&gt; I&#x27;m even happy to hear arguments about whether it is legally binding to say &quot;No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means without the prior written permission of the publisher&quot;.<p>Disclaimer: IANAL.<p>The law of course depends on jurisdiction, but in many countries copyright law means (a) everything is forbidden unless there is a license contract that permits the license holder to do it, or (b) it falls under a few cases of pre-defined exemptions (e.g. fair use for science and teaching, this is restricted to parts of works e.g. chapters or indiv. papers for teaching in a closed group, or citation of material as scientific standards require, no mention of ML training).<p>No mention =&gt; no license.
评论 #43012330 未加载
评论 #43013069 未加载
Havoc3 个月前
That’s always been a bit of an open secret. And not just Llama. A big chunk of AI world is suspect though harder to prove since others were more circumspect in articulating it than meta.<p>Remains to be seen what the courts do. Doesn’t seem viable to put this genie back in bottle so doesn’t seem like there are good options for the judge either
rob743 个月前
&gt; <i>Taking a look at a sample file listing shows a number of books which appear to be commercially sold rather than being released for free.</i><p>&quot;a number of books which appear to be commercially sold&quot; is a bit of an understatement. According to a quick search of the list, it contains the complete works of the likes of Stephen King, John Grisham, Michael Crichton, Dan Brown and J.R.R Martin. So not just a few less-known books included by mistake...
gmuslera3 个月前
It can be used for more LLMs or whatever that was somewhat trained with material that have some kind of license attached. Like Github&#x27;s Copilot, for starters, even open source software have licenses. And they might have taken part of training of most LLMs. If you enforce copyrights in one case then you are enabling it for the other case.
squircle3 个月前
&gt; I suspect we&#x27;re about to hear some arguments from AI-maximalists that LLaMA is sentient and that deleting it would be akin to murder - and wiping out AIs trained on stolen property is literally genocide. I don&#x27;t believe that for a second.<p>Me thinks anthropomorphizing clocks is as silly as worrying if the boiling water in the pot feels pain. Even if it was feasible for a machine to be conscious, it would be conscious at a lower level than &quot;software&quot;, which, to me, has come to resemble a hyper hyperbolic information layer (emerging and part&#x2F;parcel of human consciousness.) (Call it an egregore if you want, but to me it seems more like a data lake.) This may be a platitude but, any agency the machine possesses is due to the human agency that nudged it in such a direction. We&#x27;ve created a lovely mirror test for ourselves.
评论 #43010775 未加载
评论 #43010797 未加载
neom3 个月前
Related: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=37379297">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=37379297</a>
spacebanana73 个月前
Courts and lawyers only have so much power.<p>Yes they generally won the digital IP battles of the 90s, but this is much harder. Millions of people have downloaded Llama models and there&#x27;s a great variety of derivative and distilled models. It&#x27;s an open secret that practically every AI company uses copyrighted data.<p>Moreover there&#x27;s the political angle. If the US forces the west to obey copyright law for AI it&#x27;ll be very hard to compete with China.
评论 #43011056 未加载
Philpax3 个月前
&gt; I suspect we&#x27;re about to hear some arguments from AI-maximalists that LLaMA is sentient and that deleting it would be akin to murder - and wiping out AIs trained on stolen property is literally genocide. I don&#x27;t believe that for a second.<p>...really? There&#x27;s strawmen, and then there&#x27;s this. Even if there were more Blake Lemoines out there, the &quot;consciousness&quot; of a LLM begins and ends with its context window. The weights by themselves are not alive.
2c2c2c3 个月前
watching society 180 and start simping for copyright law is so depressing
评论 #43010991 未加载
Ukv3 个月前
&gt; isn&#x27;t this a slam dunk case? Meta literally published a paper where they said &quot;We trained this AI on Intellectual Property which we knew had been obtained without the owners&#x27; consent.&quot;<p>Making&#x2F;receiving copies without authorization of the rightsholder can be permissible - it&#x27;ll come down to a fair use analysis. Purpose to me seems highly transformative (and &quot;The more transformative the new work, the less will be the significance of other factors&quot;), but the other factors could swing the other way.<p>Worth noting that when Google Books was determined to be fair use, the amount&#x2F;substantiality factor didn&#x27;t weigh against them despite storing full copies of millions of books, because <i>&quot;what matters in such cases is not so much &quot;the amount and substantiality of the portion used&quot; in making a copy, but rather the amount and substantiality of what is thereby made accessible to a public&quot;</i>. That could be seen here as the amount present in generated responses, opposed to everything the model was trained on.<p>&gt; I suspect we&#x27;re about to hear some arguments from AI-maximalists that LLaMA is sentient and that deleting it would be akin to murder - and wiping out AIs trained on stolen property is literally genocide.<p>Not sure if I can call this a strawman since there will inevitably be someone somewhere making an argument like this, but it&#x27;s not a defense being used in the lawsuits in question.<p>My primary issues with &quot;wiping out AIs trained on stolen property&quot; are:<p>1. It&#x27;s not just LLMs that are trained like this. If you&#x27;re making a model to segment material defects or detect tumors, you typically first pre-train on a dataset like ImageNet before fine-tuning on the (far smaller) task-specific dataset. Even if you believe LLMs are mostly hype, there&#x27;s a whole lot else - much of which fairly uncontroversially beneficial - that you&#x27;d be inhibiting<p>2. Copyright&#x27;s basis is &quot;To promote the Progress of Science and useful Arts&quot;. Wiping out existing models, and likely stifling future ones, seems hard to justify under this basis. Ensuring rightsholders profit is intended as a means to achieve such progress, not a goal in and of itself to which progress can take a back seat<p>3. I do not believe stricter copyright law would help individuals. Realistically, developers training models would go to Reddit&#x2F;X&#x2F;Github&#x2F;Getty&#x2F;etc. selling licensed (by ToS agreement) user content, and there&#x27;s little incentive for those companies to pass on the profit beyond maybe some temporary PR moves. Much of what&#x27;s possible for open-source or academic communities may no longer be, on account of licensing fees<p>4. It doesn&#x27;t seem politically viable to demand models are wiped out. Leading in the field, and staying ahead of China, is currently seen as a big important issue - we&#x27;re not going to erase our best models because NYT asked so. Could hope for mandatory licensing - I think it&#x27;d still likely be a negative for open-source development, but it&#x27;s more plausible than deleting models trained on copyrighted material
评论 #43010943 未加载
评论 #43011415 未加载