TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Is there any license that is designed to exclude LLMs?

35 点作者 urlwolf6 个月前
I don&#x27;t want my content to be harvested by LLMs; They are removing attribution, among other things. Otherwise, I&#x27;d like to stick as close as possible to the open source licenses (say MIT). Is there such a license out there? If not, anyone working on such a thing?<p>So far what we have learned is that robots.txt doesn&#x27;t work; major sites are using login-only access with 2FA to have any hope to keep their content away from LLMs. I imagine the licenses would be one thing, but actually implementing&#x2F;enforcing them might be a whole other can of worms!

7 条评论

kouteiheika6 个月前
The LLMs&#x27; training data is already mostly All Rights Reserved content which is more restrictive than whatever license you could come up with, and if that doesn&#x27;t stop anyone then sure as hell you won&#x27;t stand a chance either.<p>You best bet to fight back is to either try to poison your data, or to train your own models on <i>their</i> data.
Ukv6 个月前
If machine learning is found to be fair use, the license you choose does not matter - in the same way Google Books can scan books and make them searchable without a specific license to do so.<p>If machine learning is <i>not</i> found to be fair use, and your concern is the removal of attribution, then MIT license should be fine.<p>&gt; So far what we have learned is that robots.txt doesn&#x27;t work;<p>The companies training models I&#x27;m aware of[0][1][2] all respect robots.txt for their crawling. Can&#x27;t necessarily guarantee that all of them do - but the fact that smaller players are likely to use CommonCrawl (which also follows robots.txt[3]) means it should catch the vast majority of cases and I&#x27;d recommend it if you don&#x27;t want your work trained on.<p>&gt; major sites are using login-only access with 2FA to have any hope to keep their content away from LLMs<p>I suspect it&#x27;s more that users with accounts are more valuable than lurkers, and framing forced sign-up as protecting user data from LLMs is a convenient excuse.<p>[0]: <a href="https:&#x2F;&#x2F;platform.openai.com&#x2F;docs&#x2F;bots" rel="nofollow">https:&#x2F;&#x2F;platform.openai.com&#x2F;docs&#x2F;bots</a><p>[1]: <a href="https:&#x2F;&#x2F;support.anthropic.com&#x2F;en&#x2F;articles&#x2F;8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler" rel="nofollow">https:&#x2F;&#x2F;support.anthropic.com&#x2F;en&#x2F;articles&#x2F;8896518-does-anthr...</a><p>[2]: <a href="https:&#x2F;&#x2F;blog.google&#x2F;technology&#x2F;ai&#x2F;an-update-on-web-publisher-controls&#x2F;" rel="nofollow">https:&#x2F;&#x2F;blog.google&#x2F;technology&#x2F;ai&#x2F;an-update-on-web-publisher...</a><p>[3]: <a href="https:&#x2F;&#x2F;commoncrawl.org&#x2F;faq" rel="nofollow">https:&#x2F;&#x2F;commoncrawl.org&#x2F;faq</a>
krapp6 个月前
You don&#x27;t have a choice. Any content you put online will be harvested by LLMs regardless of your intent, or any license you post to the contrary. That&#x27;s already the norm and it isn&#x27;t going to change any time soon.<p>hehehheh&#x27;s comment is your best option - poison your content when possible. It&#x27;s still going to be consumed but at least you can make the LLMs choke on it. Second best option is to never post content to the free internet, but even that&#x27;s just a temporary measure - all accessible data (including private data) will be assimilated eventually.. But expecting a license to work in a post LLM world is just naive.
评论 #42172456 未加载
评论 #42171625 未加载
hehehheh6 个月前
Best license then would be an LLM poisoning attack.
评论 #42174325 未加载
DamonHD6 个月前
Any licence that requires attribution <i>should</i> be enough <i>in principle</i>, eg CC BY 4.0, Apache 2.0.
评论 #42170929 未加载
ranger_danger6 个月前
If you care about it being an OSI-approved license (or purists arguing that it&#x27;s not really &quot;open source&quot;), then any restrictions on who&#x2F;what can use the software violates the FSF&#x27;s &quot;freedom zero&quot;: <a href="https:&#x2F;&#x2F;www.gnu.org&#x2F;philosophy&#x2F;free-sw.en.html#four-freedoms" rel="nofollow">https:&#x2F;&#x2F;www.gnu.org&#x2F;philosophy&#x2F;free-sw.en.html#four-freedoms</a>
brudgers6 个月前
<i>but actually implementing&#x2F;enforcing them might be a whole other can of worms!</i><p>Are you assuming out lawyering Google, OpenAI, etc. is <i>only</i> a can of worms?<p>A license is only as good as your legal wherewithal to enforce it. Good luck.