科技回声

hi...a potential research app requires the scraping of a few hundred websites/forms and diving into the child links to obtain the linked/parent structure ie. company->dept->title->name.in this case, this would involve going 4 levels deep, and getting the required information.so, does anyone know of a method/app/company that can be used to accomplish this. orm am i going to have to figure out how to get a number of cheap guys to write a bunch of python scripts!!thanks

...cheap guys to write a bunch of python scriptsYou know what will be 'cheap'. Writing it yourself.

Libraries like mechanize and hpricot are shrinking the curve for scraping tasks. That's not to say it is easy, but it should not take a bunch of people working on it. One good developer with proper experience would be ample in my opinion.

Or get one expensive guy to write you a script thatwrites the scripts to scrape the sites by scraping the sites to read the structure to write the scripts to scrape the sites.

Check out the search.wikia.org project. They make their crawler (and crawl data) available. Maybe you can get away with using theirs. That would really be cheap!

I am unable to think of an application for this technology other than spamming. Care to provide more details before we shoot ourselves in the face by helping you?

...cheap guys to write a bunch of python scriptsYou know what will be 'cheap'. Writing it yourself.

Or get one expensive guy to write you a script thatwrites the scripts to scrape the sites by scraping the sites to read the structure to write the scripts to scrape the sites.

Check out the search.wikia.org project. They make their crawler (and crawl data) available. Maybe you can get away with using theirs. That would really be cheap!

I am unable to think of an application for this technology other than spamming. Care to provide more details before we shoot ourselves in the face by helping you?

Ask YC: How to web scrape 100s of sites/forms?

5 条评论

Ask YC: How to web scrape 100s of sites/forms?

5 条评论