科技回声

11 条评论

bicx大约 1 年前

Google scrapes like a maniac. And for profit. Many others do the same.A website can put up a TOS prohibiting such use, but my understanding is that is essentially unenforceable if the site is publicly accessible.The recent Meta v Bright Data case highlights how extreme it can get without being technically illegal. <a href="https://techcrunch.com/2024/02/26/meta-drops-lawsuit-against-web-scraping-firm-bright-data-that-sold-millions-of-instagram-records/" rel="nofollow">https://techcrunch.com/2024/02/26/meta-drops-lawsuit-against...</a>If you’re trying to prevent scraping of your data, your best option is to not make it public.

评论 #40439057 未加载

评论 #40439593 未加载

Nextgrid大约 1 年前

If you can paste the URL in a browser and copy paste the next, why is it bad that a third-party agent can do the same? It's no different than a remotely-hosted browser you control via natural language, or asking a human assistant to do it and email you the result.

评论 #40435717 未加载

评论 #40437027 未加载

persedes大约 1 年前

I've encountered a couple of robots.txt that specifically block popular llms for certain areas. Example:<a href="https://www.sigmaaldrich.com/robots.txt" rel="nofollow">https://www.sigmaaldrich.com/robots.txt</a>

icedchai大约 1 年前

My understanding is scraping public sites is legal. It's no different from a search engine crawling your site.

评论 #40436460 未加载

brianjking大约 1 年前

You can opt out.<a href="https://platform.openai.com/docs/gptbot" rel="nofollow">https://platform.openai.com/docs/gptbot</a>

评论 #40435213 未加载

tripplyons大约 1 年前

Scraping and violating TOS are not illegal to do, but they can get you blocked.

评论 #40435311 未加载

xcasperx大约 1 年前

I believe this is current precedent around scraping:<a href="https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn" rel="nofollow">https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn</a>

brudgers大约 1 年前

Terms of service enforcement is a matter of civil law.Your legal wherewithal relative to those who abuse them is what gives your terms of service teeth. Or leaves you toothless.

mensetmanusman大约 1 年前

Preventing scraping also entrenches google for eternity.

rl3大约 1 年前

The web agent's system prompt is simply informed that Scarlett Johansson's voice is at the URL it's about to visit.

8note大约 1 年前

Why? It's another user agent. Curl does the same thing, as does chrome and firefox

11 条评论

bicx大约 1 年前

评论 #40439057 未加载

评论 #40439593 未加载

Nextgrid大约 1 年前

评论 #40435717 未加载

评论 #40437027 未加载

persedes大约 1 年前

icedchai大约 1 年前

My understanding is scraping public sites is legal. It's no different from a search engine crawling your site.

评论 #40436460 未加载

brianjking大约 1 年前

You can opt out.<a href="https://platform.openai.com/docs/gptbot" rel="nofollow">https://platform.openai.com/docs/gptbot</a>

评论 #40435213 未加载

tripplyons大约 1 年前

Scraping and violating TOS are not illegal to do, but they can get you blocked.

评论 #40435311 未加载

xcasperx大约 1 年前

I believe this is current precedent around scraping:<a href="https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn" rel="nofollow">https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn</a>

brudgers大约 1 年前

Terms of service enforcement is a matter of civil law.Your legal wherewithal relative to those who abuse them is what gives your terms of service teeth. Or leaves you toothless.

mensetmanusman大约 1 年前

Preventing scraping also entrenches google for eternity.

rl3大约 1 年前

The web agent's system prompt is simply informed that Scarlett Johansson's voice is at the URL it's about to visit.

8note大约 1 年前

Why? It's another user agent. Curl does the same thing, as does chrome and firefox

Ask HN: Why is ChatGPT allowed to scrape other sites via prompts?

11 条评论

Ask HN: Why is ChatGPT allowed to scrape other sites via prompts?

11 条评论