科技回声

11 条评论

throwup238大约 1 年前

> For decades, robots.txt governed the behavior of web crawlers. But as unscrupulous AI companies seek out more and more data, the basic social contract of the web is falling apart.The basic social contract of the web fell apart long ago when almost everyone decided that Google was the only search engine worth serving and started aggressively blocking other crawlers.

评论 #39421253 未加载

评论 #39421607 未加载

评论 #39426369 未加载

评论 #39422406 未加载

linkjuice4all大约 1 年前

"With the rise of AI, photos of the exterior of your business are suddenly controversial"Many revenue-based websites tried to have it both ways with web crawlers wherein they wanted to block automated access or repeat viewers while letting first time viewers get a free taste. Others have noted that basically Google gets a free pass for all the traffic it brings in but everyone else has to respect robots declarations.It seems like a no brainer - if your web server is configured to reply to GET requests with a 200 status and some content then they get to do pretty much whatever they want with it.Don't want to give access to everyone? Stop sending your content for free and get them to agree to some contract and authorize/license their access to your stuff.

评论 #39421865 未加载

评论 #39422481 未加载

评论 #39421941 未加载

calibas大约 1 年前

> For decades, robots.txt governed the behavior of web crawlers.It never governed anything, web crawlers were never under any obligation to follow robots.txt.This article seems like they took an existing controversy, rebranded it as something new, then blamed in on AI.

评论 #39426024 未加载

评论 #39421752 未加载

评论 #39422086 未加载

aaronrobinson大约 1 年前

Drama. Crawlers have always been controversial.

andybak大约 1 年前

> But as unscrupulous AI companies seek out more and more dataI'm not sure I'm ready to concede the fundemental value judgement being made here. At least I refuse to accept it as a given rather then the core issue to be decided.

micromacrofoot大约 1 年前

many crawlers have always ignored robots.txt, if you’re monitoring any moderately visited site you’re bound to see random spikes of bots hammering your server no matter what text file or headers you set

评论 #39422311 未加载

elpocko大约 1 年前

robots.txt is relevant and effective, as is my DNT header.

amelius大约 1 年前

When did robots.txt get a legal status?Or did it ever?

评论 #39422070 未加载

naiv大约 1 年前

Proxy companies are a big winner now

lewhoo大约 1 年前

I don't get it. The crux of it all seems to be that Google isn't competing with owners of data it crawls using the very same data. The crawl part isn't as much of a controversy as usage, isn't it ? The mentioned eBay v. Bidder's Edge (2000) seems to be a dispute over usage.

评论 #39425073 未加载

mediumsmart大约 1 年前

The web comes in 2 versions. One of them has a basic social contract. maybe

11 条评论

throwup238大约 1 年前

评论 #39421253 未加载

评论 #39421607 未加载

评论 #39426369 未加载

评论 #39422406 未加载

linkjuice4all大约 1 年前

评论 #39421865 未加载

评论 #39422481 未加载

评论 #39421941 未加载

calibas大约 1 年前

评论 #39426024 未加载

评论 #39421752 未加载

评论 #39422086 未加载

aaronrobinson大约 1 年前

Drama. Crawlers have always been controversial.

andybak大约 1 年前

micromacrofoot大约 1 年前

评论 #39422311 未加载

elpocko大约 1 年前

robots.txt is relevant and effective, as is my DNT header.

amelius大约 1 年前

When did robots.txt get a legal status?Or did it ever?

评论 #39422070 未加载

naiv大约 1 年前

Proxy companies are a big winner now

lewhoo大约 1 年前

评论 #39425073 未加载

mediumsmart大约 1 年前

The web comes in 2 versions. One of them has a basic social contract. maybe

With the rise of AI, web crawlers are suddenly controversial

11 条评论

With the rise of AI, web crawlers are suddenly controversial

11 条评论