TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

QVC Sues Shopping App for Web Scraping That Allegedly Triggered Site Outage

27 点作者 domdip超过 10 年前

7 条评论

johngd超过 10 年前
My main focus for the entirety of my career has been on internet facing consumer web applications. I have seen many, many, DOS attacks from IRC bots to Ukrainian web scrapers to Chinese get-lucky wordpress exploit scanners. Most of these can be ignored and blocked with little effort.<p>By FAR the most annoying of any of these is when Google, Bing and&#x2F;or Yahoo decide to wake up and crawl your infrastructure with little regard to your robots.txt or webmaster settings, if available. I think they have got better in recent years, but they used to be the absolute worst. It came down to: Let us DOS you, or have your ranking suffer. Suing Google, Bing, Yahoo isn&#x27;t exactly an option.<p>Some context: I was the lead architect&#x2F;engineer combo for a CMS that hosted ~500k domains for a fairly large international company. Some days I could login and see them crawling every domain from A-Z. Some days I would get caught by Google and Bing at the same time. They were the largest consumers of data on this system.
评论 #8706194 未加载
评论 #8706311 未加载
birken超过 10 年前
Result.ly are really a bunch of jerks. One of the most common sense things you can possibly do while crawling a website is monitor the response time and&#x2F;or error rates from the sites you are crawling. If those are going up, your crawl rate should go down or go to 0.<p>There is one form of internet justice, which is QVC should file abuse complaints to the ISPs that host those IPs. I&#x27;ve found abuse complaints are the best way to stop people from using IPs for bad activities (excessive scraping, spamming, etc).
评论 #8706078 未加载
Someone1234超过 10 年前
&gt; Of these and other causes of action typically alleged in these situations, the breach of contract claim is often the clearest source of a remedy.<p>That&#x27;s a strange claim given that we&#x27;re talking about a &quot;contract&quot; which QVC has no proof that the other party read or agreed to, and which there has been no explicit exchange (&quot;offer&quot; and &quot;acceptance&quot;).<p>Are web-site contracts&#x2F;terms even enforceable at all? According to this article[0]&#x2F;case law likely not. Strange thing for a lawyer to say, but this article makes a lot of strange claims that seem inconsistent with US case law.<p>[0] <a href="http://www.forbes.com/sites/oliverherzfeld/2013/01/22/are-website-terms-of-use-enforceable/" rel="nofollow">http:&#x2F;&#x2F;www.forbes.com&#x2F;sites&#x2F;oliverherzfeld&#x2F;2013&#x2F;01&#x2F;22&#x2F;are-we...</a>
评论 #8706066 未加载
评论 #8706011 未加载
Xorlev超过 10 年前
Having been on both sides of the coin, once you hit 600 reqs&#x2F;s without a prior arrangement, that almost qualifies as a DoS attack. If they&#x27;d maintained 200-300 req&#x2F;min would have been pretty acceptable.
Spoom超过 10 年前
Honestly, you <i>really</i> shouldn&#x27;t have to hit &quot;36,000 requests per minute&quot; scraping a website for price updates. Can someone explain if there is any scenario in which this is reasonable? Do QVC&#x27;s prices change that often?
评论 #8707845 未加载
评论 #8706071 未加载
swalsh超过 10 年前
I have mixed feelings about this. On the one hand, the bot seems to have been a really bad netizen. On the other hand I hate the idea of there being a precedence that you can be sued for automating get requests.
评论 #8706060 未加载
korzun超过 10 年前
Agree with the suit but QVC (by this time) should have rate limiting &#x2F; throttling per IP.<p>(waits for somebody to claim that each request came from a different proxy)
评论 #8706113 未加载
评论 #8706149 未加载
评论 #8706414 未加载