TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: What are some of the major problems being faced because of web scraping?

8 点作者 nachivpn超过 10 年前
Disclaimer: I work for an anti-scraping service company. I am not trying to advertise it, but simply understand problems that people are actually facing because of web scraping and how it is affecting them.

4 条评论

jpetersonmn超过 10 年前
I&#x27;m sure there are instances where scraping websites causes legitimate issues, however most of the complaining I&#x27;ve seen from website operators was the perceived theft of their data. (even though it was publicly available through the browser) Not so much of a bandwidth or performance issue that the scraping causes.<p>I&#x27;m of the opinion that web scraping has an unwarranted bad reputation. As long as I&#x27;m respecting your robots.txt and not scraping behind logins, etc... then it&#x27;s no different than how Google operates.
joshschreuder超过 10 年前
I think bandwidth costs and the possibility of accidentally DDoSing the site if the scraper gets out of control are probably big issues along with the &#x27;theft of data&#x27; mentioned.
mattwritescode超过 10 年前
Surely you should know the problems if you are working for an anti-scraping company.... Anyway...<p>Most people who own small website dont necessarily know there website is being scrapped on a daily basis (talking sole traders, tiny businesses). If they are paying for adwords or local advertising through parish or county community websites then they may think they are getting bang for the buck than they actually think. If they get 10 visitors a day and 8 of those are scrapers what does this really mean for there advertising revenue. Obviously they should be basing there return on investment against revenue but still a website is seen as a big thing for most small businesses.
评论 #8840048 未加载
iqonik超过 10 年前
Google penalising a site for not having original content may be one. Ofc, it uses bandwidth and costs the site you&#x27;re scraping resource&#x2F;money for no benefit to them.
评论 #8840058 未加载