TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask YC: How to web scrape 100s of sites/forms?

1 pointsby bdouglasalmost 17 years ago
hi...<p>a potential research app requires the scraping of a few hundred websites/forms and diving into the child links to obtain the linked/parent structure ie. company-&#62;dept-&#62;title-&#62;name.<p>in this case, this would involve going 4 levels deep, and getting the required information.<p>so, does anyone know of a method/app/company that can be used to accomplish this. orm am i going to have to figure out how to get a number of cheap guys to write a bunch of python scripts!!<p>thanks

5 comments

nreecealmost 17 years ago
<i>...cheap guys to write a bunch of python scripts</i><p>You know what will be 'cheap'. Writing it yourself.
qhoxiealmost 17 years ago
Libraries like mechanize and hpricot are shrinking the curve for scraping tasks. That's not to say it is easy, but it should not take a bunch of people working on it. One good developer with proper experience would be ample in my opinion.
olefooalmost 17 years ago
Or get one expensive guy to write you a script that<p>writes the scripts to scrape the sites by scraping the sites to read the structure to write the scripts to scrape the sites.
Anon84almost 17 years ago
Check out the search.wikia.org project. They make their crawler (and crawl data) available. Maybe you can get away with using theirs. That would <i>really</i> be cheap!
gaiusalmost 17 years ago
I am unable to think of an application for this technology other than spamming. Care to provide more details before we shoot ourselves in the face by helping you?