TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Microsoft says that it's okay to steal web content because it's 'freeware.'

37 点作者 blinding-streak11 个月前

8 条评论

gerdesj11 个月前
Someone&#x27;s reasons for sharing information are coloured by the situation at the time of sharing it, amongst many other factors.<p>Two years ago (say) no one predicted the meteoric rise of LLMs and their voracious appetite for data sets for training. These beasties are not simply search engines that are better direction pointers to your stuff (with a frisson of ads) but insist on being the final word and keep you out. To be blunt: It is stealing.<p>The implied contract for publishing on the web has changed again, just as it has several times in the past. The worst thing here is the use of the term &quot;freeware&quot;. Describing original content, displayed for all to see as -ware is outrageous.<p>They might as well describe the content on Spotify and co as freeware ... bear with me: you could scrape wifi connections through your publicly available APs or even do some more broadband funky spectrum capture analysis and claim that is what an internet search engine does in its spare time and all is fine (lol).<p>LLMs and GenAI are quite interesting things but I do not think that they are the last word in ... AI. Anyway the latest cool thingie cannot be allowed to break whatever the current unspoken and somewhat undefined social contract is in place.<p>This bloke from MS seems to have forgotten that there really is a social contract of some sort and that if you say: &quot;fuck you lot, omnomnom ... mmmm data ... ... laters (lol)&quot; there might be some come back.
taspeotis11 个月前
It’s a pro-AI position but not really controversial?<p>My reading is he is saying content that is not under an explicit license for usage, that is made available publicly and freely, is fair game for training.<p>&gt; In his remarks, Suleyman claimed that all content shared on the web is available to be used for AI training unless a content producer says otherwise specifically.<p>&gt; &quot;With respect to content that is already on the open web, the social contract of that content since the 90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been freeware, if you like. That&#x27;s been the understanding,&quot; said Suleyman.<p>&gt; &quot;There&#x27;s a separate category where a website or a publisher or a news organization had explicitly said, &#x27;do not scrape or crawl me for any other reason than indexing me so that other people can find that content.&#x27; That&#x27;s a gray area and I think that&#x27;s going to work its way through the courts.&quot;
评论 #40826834 未加载
评论 #40826754 未加载
评论 #40826755 未加载
评论 #40826726 未加载
评论 #40828668 未加载
candiddevmike11 个月前
It&#x27;s ironic that Microsoft used copyright protection and IP law for years to secure a dominant market position, and now they don&#x27;t need to play by the same rules because &quot;something something AI&quot;.
评论 #40826743 未加载
dingosity11 个月前
Before we get too upset... can we verify this is MSFT&#x27;s official position? I suspect this <i>may</i> be hyperbole. It <i>could</i> be Sulyman was constructing a hypothetical point that didn&#x27;t survive translation into click-bait. That being said... MSFT has a history of chicanery. I&#x27;m off to try to find original sources. If anyone else has any, please provide a link.<p>FWIW... I found a few videos related to Endicott&#x27;s story:<p>* This is a quick 5 minute video where Suleyman talks about how indeterminacy is good. So... you know... it&#x27;s a <i>good</i> think that Co-Pilot can&#x27;t tell you why it thinks it needs to dump 800 line of java code into your hello world program. At around 3:44, he confuses LLMs (with a surface understanding of syntax married with a markov chain on steroids) with people (who as best we can tell have a different understanding of the thing represented.) Corporate management confusing the the map with the territory? Who could have forseen such a thing: <a href="https:&#x2F;&#x2F;youtu.be&#x2F;GsGFYoIx1YM" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;GsGFYoIx1YM</a><p>* This one seems to be the longer version, but I&#x27;m still looking for where Endicott&#x27;s quote comes from, but around the 14minute mark is where the conversation turns towards &quot;who owns the ip&quot; used to train LLMs and the terms &quot;Fair Use&quot; and &quot;Freeware&quot; are used around the 14m50s mark: <a href="https:&#x2F;&#x2F;youtu.be&#x2F;lPvqvt55l3A" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;lPvqvt55l3A</a><p>[EDIT: So... yes... get out the pitch-forks... Microsoft is saying anything on the web is inherently freeware or subject to fair use even if you think you remember putting a copyright notice on it (or, as is mentioned in US copyright law, the creator automatically receives copyright protections upon creation of the work.)]
评论 #40826719 未加载
mmh000011 个月前
Of course it&#x27;s okay.<p>I make an http _REQUEST_, the server voluntarily fulfills the request.<p>Why is it okay for a person to view your content, memorize it, and use it as a base for new content while it&#x27;s not okay for an AI? at the end of the day it is the same thing.
评论 #40828167 未加载
评论 #40828558 未加载
评论 #40829530 未加载
nineteen99911 个月前
The Windows source code was leaked onto the web many years ago wasn&#x27;t it? Guess that makes it freeware too.
ginvok11 个月前
With this logic, so is pirated software right? It&#x27;s free because it&#x27;s on the internet.
userbinator11 个月前
This doesn&#x27;t deprive the original owner, so they should use &quot;share&quot; or &quot;pirate&quot; instead.
评论 #40826773 未加载