科技回声

7 条评论

johngd超过 10 年前

My main focus for the entirety of my career has been on internet facing consumer web applications. I have seen many, many, DOS attacks from IRC bots to Ukrainian web scrapers to Chinese get-lucky wordpress exploit scanners. Most of these can be ignored and blocked with little effort.By FAR the most annoying of any of these is when Google, Bing and/or Yahoo decide to wake up and crawl your infrastructure with little regard to your robots.txt or webmaster settings, if available. I think they have got better in recent years, but they used to be the absolute worst. It came down to: Let us DOS you, or have your ranking suffer. Suing Google, Bing, Yahoo isn't exactly an option.Some context: I was the lead architect/engineer combo for a CMS that hosted ~500k domains for a fairly large international company. Some days I could login and see them crawling every domain from A-Z. Some days I would get caught by Google and Bing at the same time. They were the largest consumers of data on this system.

评论 #8706194 未加载

评论 #8706311 未加载

birken超过 10 年前

Result.ly are really a bunch of jerks. One of the most common sense things you can possibly do while crawling a website is monitor the response time and/or error rates from the sites you are crawling. If those are going up, your crawl rate should go down or go to 0.There is one form of internet justice, which is QVC should file abuse complaints to the ISPs that host those IPs. I've found abuse complaints are the best way to stop people from using IPs for bad activities (excessive scraping, spamming, etc).

评论 #8706078 未加载

Someone1234超过 10 年前

> Of these and other causes of action typically alleged in these situations, the breach of contract claim is often the clearest source of a remedy.That's a strange claim given that we're talking about a "contract" which QVC has no proof that the other party read or agreed to, and which there has been no explicit exchange ("offer" and "acceptance").Are web-site contracts/terms even enforceable at all? According to this article[0]/case law likely not. Strange thing for a lawyer to say, but this article makes a lot of strange claims that seem inconsistent with US case law.[0] <a href="http://www.forbes.com/sites/oliverherzfeld/2013/01/22/are-website-terms-of-use-enforceable/" rel="nofollow">http://www.forbes.com/sites/oliverherzfeld/2013/01/22/are-we...</a>

评论 #8706066 未加载

评论 #8706011 未加载

Xorlev超过 10 年前

Having been on both sides of the coin, once you hit 600 reqs/s without a prior arrangement, that almost qualifies as a DoS attack. If they'd maintained 200-300 req/min would have been pretty acceptable.

Spoom超过 10 年前

Honestly, you really shouldn't have to hit "36,000 requests per minute" scraping a website for price updates. Can someone explain if there is any scenario in which this is reasonable? Do QVC's prices change that often?

评论 #8707845 未加载

评论 #8706071 未加载

swalsh超过 10 年前

I have mixed feelings about this. On the one hand, the bot seems to have been a really bad netizen. On the other hand I hate the idea of there being a precedence that you can be sued for automating get requests.

评论 #8706060 未加载

korzun超过 10 年前

Agree with the suit but QVC (by this time) should have rate limiting / throttling per IP.(waits for somebody to claim that each request came from a different proxy)

评论 #8706113 未加载

评论 #8706149 未加载

评论 #8706414 未加载

7 条评论

johngd超过 10 年前

评论 #8706194 未加载

评论 #8706311 未加载

birken超过 10 年前

评论 #8706078 未加载

Someone1234超过 10 年前

评论 #8706066 未加载

评论 #8706011 未加载

Xorlev超过 10 年前

Spoom超过 10 年前

评论 #8707845 未加载

评论 #8706071 未加载

swalsh超过 10 年前

评论 #8706060 未加载

korzun超过 10 年前

Agree with the suit but QVC (by this time) should have rate limiting / throttling per IP.(waits for somebody to claim that each request came from a different proxy)

评论 #8706113 未加载

评论 #8706149 未加载

评论 #8706414 未加载

QVC Sues Shopping App for Web Scraping That Allegedly Triggered Site Outage

7 条评论

QVC Sues Shopping App for Web Scraping That Allegedly Triggered Site Outage

7 条评论