TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Using mTurk to morally/legally get around a robots.txt disallow?

1 点作者 MechanicalTwerk超过 11 年前
A site offers any visitor (authenticated or not) free download of documents at a certain path. This path is disallowed from being crawled by all user agents in robots.txt. What is the consensus around using something like Mechanical Turk to distribute the process of physically clicking the free download link and collecting the documents? Would this fall into the "avoiding a technological control" category? I know, I know, I should ask a lawyer, but I'm interested in the community's opinion on the practice.

2 条评论

byoung2超过 11 年前
If you have to ask, you probably already know the answer. A more important question is what do you plan to do with the files, and is this use allowed by the terms? If it is, then you can possibly ask the site to allow you access to download them. It is possible that they disallow crawling just to reduce load on their servers, and so crawlers don't waste time on text files when there is more valuable content to crawl elsewhere on the site.
icedchai超过 11 年前
robots.txt is not legally enforceable.