TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

80 legs: Web Crawler as a Service

59 pointsby luckystrikeabout 16 years ago

3 comments

westside1506about 16 years ago
Hi guys. We were actually about to do an "Ask HN: Review our startup" post, but I guess someone beat us to it.<p>So, please review our startup. :)<p>We are launching the beta today to a handful of users and will be letting in more and more users over time.<p>One other note: We don't just offer crawling. Our model is actually to allow you to analyze the web content that you discover. Using your own custom code that you push into 80legs, you can do sophisticated text processing, image processing, look inside PDFs, etc.
评论 #552605 未加载
评论 #552630 未加载
评论 #552967 未加载
mjsabout 16 years ago
Interesting, it's a botnet! From the FAQ: "How can the prices be so low?" "Plura pays developers to embed lightweight widgets in their desktop applications or websites. These widgets harness the idle and excess bandwidth and computing power on the computers of people using the applications and websites."
评论 #552583 未加载
评论 #552537 未加载
gojomoabout 16 years ago
Very interesting service! A number of questions...<p>What User-Agent do you use?<p>Do you crawl non-textual resources?<p>Do you save all headers from the crawled responses?<p>Do you perform any processing on the returned content (like de-chunking or de-compressing) or can it be retrieved verbatim?<p>If two customers request the same URL/site be crawled, are their requests merged so the site is only crawled once?<p>Do you save the exact time of the request (not trusting the returned 'Date' header)?