TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Microsofts AI boss thinks its perfectly OK to steal content if its on open web

70 pointsby avivallssa11 months ago

19 comments

jimmaswell11 months ago
From the beginning, it's seemed completely intuitive to me that training a computer made of sand on publicly available content and then generating art later should be fair use, so long as it's fair use to train the meat computer in your head on the same content and then use it to generate art later. There's no meaningful difference to me as far as the ethics of the act are concerned.
评论 #40834284 未加载
评论 #40834158 未加载
评论 #40840356 未加载
评论 #40836954 未加载
评论 #40843230 未加载
评论 #40834232 未加载
jsyang0011 months ago
No he doesn&#x27;t.<p>&gt; I think that with respect to content that’s already on the open web, the social contract of that content since the ‘90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been “freeware,” if you like, that’s been the understanding.<p>&gt; There’s a separate category where a website, or a publisher, or a news organization had explicitly said ‘do not scrape or crawl me for any other reason than indexing me so that other people can find this content.’ That’s a grey area, and I think it’s going to work its way through the courts.
评论 #40833787 未加载
评论 #40833794 未加载
评论 #40833776 未加载
评论 #40833745 未加载
评论 #40833813 未加载
评论 #40834249 未加载
评论 #40833783 未加载
评论 #40833825 未加载
评论 #40833790 未加载
评论 #40833735 未加载
JonChesterfield11 months ago
I&#x27;ll bet they don&#x27;t consider the windows and office source code fair game for arbitrary reuse provided the other party found the copy on the web. Even if the person found the copy on GitHub.
beefnugs11 months ago
Isn&#x27;t this discussion at all stupidly letting them control the goal posts? They have already gone far beyond this thinking that everything someone does on their own personal computer in their own home without the slightest bit of consent is going to be slurped up and recorded in case they want to query it someday.<p>This is like arguing that this guy who just murdered someone 10 minutes ago, should actually be able to steal the candy from this child since the child put it down on the park bench.
starik3611 months ago
The more I read about this guy the more I get the feeling that he is an unscrupulous individual.<p>robots.txt is a &quot;grey idea&quot; to him, instead of being a directive to keep moving? Wow.
mewpmewp211 months ago
What exactly is wrong with the statement he has made?
评论 #40833768 未加载
评论 #40833772 未加载
评论 #40833704 未加载
评论 #40833749 未加载
1vuio0pswjnm711 months ago
He compares fear of &quot;AI&quot; to fear of calculators. But &quot;AI&quot; cannot do math. Calculators do not &quot;hallucinate&quot;. They are not correct &quot;80%&quot; of the time. They are correct 100% of the time. We know how they work. IIRC, in the 1970s someone at Bell Labs wrote a UNIX program that could generate fake academic papers. It might be a fun gag but it does it have much practical utility. No matter how &quot;real&quot; the papers might appear, or even if they are correct &quot;80%&quot; of the time, it is not an &quot;invention&quot;, and it is certainly not comparable to a calculator.
avivallssa11 months ago
Will this make people who make indirect money through their content, less motivated from publishing their content on the Web ? This might be arguable.<p>May be, there should be a similar amount of openness in publishing the content used for training commercial models.<p>The copyright owner should have a privilege to ask for that content to be removed from training. This may also allow individual authors to gain their share with their Advanced RAG applications, that are specially focussed on the content they own and also published on the web.
评论 #40833504 未加载
评论 #40834030 未加载
评论 #40833654 未加载
29athrowaway11 months ago
One thing is a robots.txt policy, meant mostly for search crawlers.<p>Another thing is the copyright of the content, terms of use policies, etc.<p>Abiding by a robots.txt policy doesn&#x27;t make you immune to copyright, terms of service, law in various jurisdictions, etc. If you think that you are probably a kleptomaniac.<p>Just create a robots.txt with &quot;User-Agent: one billion asterisks&quot; so that the crawlers die when parsing it.
sircastor11 months ago
It seems obvious to me that there is no such thing as AI without publicly training on the open web, and that any kind of licensing is an impossible feat.<p>Programs from my youth (Daria, Captain N) had licensed music for their broadcast, and that’s all because what else was ever going to be done? 20 years later, streaming with the music intact is an impossibility because the kind of money necessary to license <i>all</i> of it was too much. And you have to make deals with dozens of companies.<p>Multiply that by several orders of magnitude and you start to see the scope of the problem.
评论 #40834018 未加载
评论 #40834005 未加载
sircastor11 months ago
Part of the problem here is that the web has gone through lots of change as to what it is and how people understand it.<p>Some people think of it as billboards posted on the highway. Some think it’s a bulletin board. Some think it’s a newspaper. A television, a “zine”, a diary, graffiti. It has been all of these things, and is and isn’t. And people who publish are really bad at explicitly stating which one they are. But they expect you to know.
评论 #40839000 未加载
fimdomeio11 months ago
So we&#x27;ve now learned that copyright is determined by communications protocol. If you&#x27;re using torrents it&#x27;s copyright infringement, if it&#x27;s the web then it&#x27;s public domain.
评论 #40833721 未加载
评论 #40833780 未加载
boring-alterego11 months ago
Hmm hear me out, go to a public website and add black space below any video or picture with random adjectives that are your satire review of that piece of art then feed those into the ai model and tell it to ignore any text.
KoolKat2311 months ago
This is nothing but performative clickbait by the Verge.<p>It is classified as fair use, the term is transformative use, where those using it are training models (their intention) if anyone wishes to Google it.<p>The end.
评论 #40834050 未加载
whacko_quacko11 months ago
scraping the open web shouldn&#x27;t be a crime[1], even if unsavoury people do it for unsavoury purposes<p>[1]: or even just an issue
byyll11 months ago
It&#x27;s not stealing content if the content is still in the original place. Stop trying to redefine words. It&#x27;s copying.
评论 #40839003 未加载
93po11 months ago
If buying isn&#x27;t owning, copying isn&#x27;t stealing. This is a really tired argument.
cjk211 months ago
Ah yes the implied social contract that it&#x27;s ok because it happens all the time.<p>That&#x27;s how society falls.
tiahura11 months ago
The open web&#x27;s ethos since its inception in the 1990s has been one of unrestricted access and fair use. Content published openly online inherently invites broad consumption, reproduction, and creative reuse by the public. This is not merely custom, but a fundamental aspect of fair use doctrine as applied to the digital realm.<p>The four factors of fair use - purpose of use, nature of the copyrighted work, amount used, and effect on the market - overwhelmingly favor allowing free use of openly published web content. The transformative nature of most reuses, the public availability of the original works, the necessity of using entire works in many cases, and the lack of a traditional market for such content all support this interpretation.<p>This longstanding practice has been the catalyst for unprecedented innovation and information dissemination. It represents a tacit social contract between content creators and users, establishing a de facto &quot;freeware&quot; model for open web content. Any attempt to retroactively impose strict copyright limitations would not only stifle innovation but also contradict decades of established legal precedent and digital norms.<p>-As a side note, I’m not certain that training necessarily involves “copying.”<p>—-Lastly, if anyone really thinks the Robert’s court is going to knee-cap AI, you’re soft in the head.
评论 #40833739 未加载
评论 #40833707 未加载
评论 #40833869 未加载