hi...<p>a potential research app requires the scraping of a few hundred websites/forms and diving into the child links to obtain the linked/parent structure ie. company->dept->title->name.<p>in this case, this would involve going 4 levels deep, and getting the required information.<p>so, does anyone know of a method/app/company that can be used to accomplish this. orm am i going to have to figure out how to get a number of cheap guys to write a bunch of python scripts!!<p>thanks
Libraries like mechanize and hpricot are shrinking the curve for scraping tasks. That's not to say it is easy, but it should not take a bunch of people working on it. One good developer with proper experience would be ample in my opinion.
Or get one expensive guy to write you a script that<p>writes the scripts to scrape the sites
by scraping the sites
to read the structure
to write the scripts to scrape the sites.
Check out the search.wikia.org project. They make their crawler (and crawl data) available. Maybe you can get away with using theirs. That would <i>really</i> be cheap!
I am unable to think of an application for this technology other than spamming. Care to provide more details before we shoot ourselves in the face by helping you?