TechEcho

11 comments

bicx12 months ago

Google scrapes like a maniac. And for profit. Many others do the same.A website can put up a TOS prohibiting such use, but my understanding is that is essentially unenforceable if the site is publicly accessible.The recent Meta v Bright Data case highlights how extreme it can get without being technically illegal. <a href="https://techcrunch.com/2024/02/26/meta-drops-lawsuit-against-web-scraping-firm-bright-data-that-sold-millions-of-instagram-records/" rel="nofollow">https://techcrunch.com/2024/02/26/meta-drops-lawsuit-against...</a>If you’re trying to prevent scraping of your data, your best option is to not make it public.

评论 #40439057 未加载

评论 #40439593 未加载

Nextgrid12 months ago

If you can paste the URL in a browser and copy paste the next, why is it bad that a third-party agent can do the same? It's no different than a remotely-hosted browser you control via natural language, or asking a human assistant to do it and email you the result.

评论 #40435717 未加载

评论 #40437027 未加载

persedes12 months ago

I've encountered a couple of robots.txt that specifically block popular llms for certain areas. Example:<a href="https://www.sigmaaldrich.com/robots.txt" rel="nofollow">https://www.sigmaaldrich.com/robots.txt</a>

icedchai12 months ago

My understanding is scraping public sites is legal. It's no different from a search engine crawling your site.

评论 #40436460 未加载

brianjking12 months ago

You can opt out.<a href="https://platform.openai.com/docs/gptbot" rel="nofollow">https://platform.openai.com/docs/gptbot</a>

评论 #40435213 未加载

tripplyons12 months ago

Scraping and violating TOS are not illegal to do, but they can get you blocked.

评论 #40435311 未加载

xcasperx12 months ago

I believe this is current precedent around scraping:<a href="https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn" rel="nofollow">https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn</a>

brudgers12 months ago

Terms of service enforcement is a matter of civil law.Your legal wherewithal relative to those who abuse them is what gives your terms of service teeth. Or leaves you toothless.

mensetmanusman12 months ago

Preventing scraping also entrenches google for eternity.

rl312 months ago

The web agent's system prompt is simply informed that Scarlett Johansson's voice is at the URL it's about to visit.

8note12 months ago

Why? It's another user agent. Curl does the same thing, as does chrome and firefox

11 comments

bicx12 months ago

评论 #40439057 未加载

评论 #40439593 未加载

Nextgrid12 months ago

评论 #40435717 未加载

评论 #40437027 未加载

persedes12 months ago

icedchai12 months ago

My understanding is scraping public sites is legal. It's no different from a search engine crawling your site.

评论 #40436460 未加载

brianjking12 months ago

You can opt out.<a href="https://platform.openai.com/docs/gptbot" rel="nofollow">https://platform.openai.com/docs/gptbot</a>

评论 #40435213 未加载

tripplyons12 months ago

Scraping and violating TOS are not illegal to do, but they can get you blocked.

评论 #40435311 未加载

xcasperx12 months ago

I believe this is current precedent around scraping:<a href="https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn" rel="nofollow">https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn</a>

brudgers12 months ago

Terms of service enforcement is a matter of civil law.Your legal wherewithal relative to those who abuse them is what gives your terms of service teeth. Or leaves you toothless.

mensetmanusman12 months ago

Preventing scraping also entrenches google for eternity.

rl312 months ago

The web agent's system prompt is simply informed that Scarlett Johansson's voice is at the URL it's about to visit.

8note12 months ago

Why? It's another user agent. Curl does the same thing, as does chrome and firefox

Ask HN: Why is ChatGPT allowed to scrape other sites via prompts?

11 comments

Ask HN: Why is ChatGPT allowed to scrape other sites via prompts?

11 comments