TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Why is ChatGPT allowed to scrape other sites via prompts?

28 点作者 jbryu大约 1 年前
The fact that I can give ChatGPT any URL and extract html content from it feels like a big TOS breach for most sites. Am I misunderstanding something about the legality of scraping? Aren't developers discouraged from scraping like this in the first place for for-profit projects?

11 条评论

bicx大约 1 年前
Google scrapes like a maniac. And for profit. Many others do the same.<p>A website can put up a TOS prohibiting such use, but my understanding is that is essentially unenforceable if the site is publicly accessible.<p>The recent Meta v Bright Data case highlights how extreme it can get without being technically illegal. <a href="https:&#x2F;&#x2F;techcrunch.com&#x2F;2024&#x2F;02&#x2F;26&#x2F;meta-drops-lawsuit-against-web-scraping-firm-bright-data-that-sold-millions-of-instagram-records&#x2F;" rel="nofollow">https:&#x2F;&#x2F;techcrunch.com&#x2F;2024&#x2F;02&#x2F;26&#x2F;meta-drops-lawsuit-against...</a><p>If you’re trying to prevent scraping of your data, your best option is to not make it public.
评论 #40439057 未加载
评论 #40439593 未加载
Nextgrid大约 1 年前
If you can paste the URL in a browser and copy paste the next, why is it bad that a third-party agent can do the same? It&#x27;s no different than a remotely-hosted browser you control via natural language, or asking a <i>human</i> assistant to do it and email you the result.
评论 #40435717 未加载
评论 #40437027 未加载
persedes大约 1 年前
I&#x27;ve encountered a couple of robots.txt that specifically block popular llms for certain areas. Example:<p><a href="https:&#x2F;&#x2F;www.sigmaaldrich.com&#x2F;robots.txt" rel="nofollow">https:&#x2F;&#x2F;www.sigmaaldrich.com&#x2F;robots.txt</a>
icedchai大约 1 年前
My understanding is scraping public sites is legal. It&#x27;s no different from a search engine crawling your site.
评论 #40436460 未加载
brianjking大约 1 年前
You can opt out.<p><a href="https:&#x2F;&#x2F;platform.openai.com&#x2F;docs&#x2F;gptbot" rel="nofollow">https:&#x2F;&#x2F;platform.openai.com&#x2F;docs&#x2F;gptbot</a>
评论 #40435213 未加载
tripplyons大约 1 年前
Scraping and violating TOS are not illegal to do, but they can get you blocked.
评论 #40435311 未加载
xcasperx大约 1 年前
I believe this is current precedent around scraping:<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;HiQ_Labs_v._LinkedIn" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;HiQ_Labs_v._LinkedIn</a>
brudgers大约 1 年前
Terms of service enforcement is a matter of civil law.<p>Your legal wherewithal relative to those who abuse them is what gives your terms of service teeth. Or leaves you toothless.
mensetmanusman大约 1 年前
Preventing scraping also entrenches google for eternity.
rl3大约 1 年前
The web agent&#x27;s system prompt is simply informed that Scarlett Johansson&#x27;s voice is at the URL it&#x27;s about to visit.
8note大约 1 年前
Why? It&#x27;s another user agent. Curl does the same thing, as does chrome and firefox