Hi guys. We were actually about to do an "Ask HN: Review our startup" post, but I guess someone beat us to it.<p>So, please review our startup. :)<p>We are launching the beta today to a handful of users and will be letting in more and more users over time.<p>One other note: We don't just offer crawling. Our model is actually to allow you to analyze the web content that you discover. Using your own custom code that you push into 80legs, you can do sophisticated text processing, image processing, look inside PDFs, etc.
Interesting, it's a botnet! From the FAQ: "How can the prices be so low?" "Plura pays developers to embed lightweight widgets in their desktop applications or websites. These widgets harness the idle and excess bandwidth and computing power on the computers of people using the applications and websites."
Very interesting service! A number of questions...<p>What User-Agent do you use?<p>Do you crawl non-textual resources?<p>Do you save all headers from the crawled responses?<p>Do you perform any processing on the returned content (like de-chunking or de-compressing) or can it be retrieved verbatim?<p>If two customers request the same URL/site be crawled, are their requests merged so the site is only crawled once?<p>Do you save the exact time of the request (not trusting the returned 'Date' header)?