TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask YC: How to web scrape 100s of sites/forms?

1 点作者 bdouglas将近 17 年前
hi...<p>a potential research app requires the scraping of a few hundred websites/forms and diving into the child links to obtain the linked/parent structure ie. company-&#62;dept-&#62;title-&#62;name.<p>in this case, this would involve going 4 levels deep, and getting the required information.<p>so, does anyone know of a method/app/company that can be used to accomplish this. orm am i going to have to figure out how to get a number of cheap guys to write a bunch of python scripts!!<p>thanks

5 条评论

nreece将近 17 年前
<i>...cheap guys to write a bunch of python scripts</i><p>You know what will be 'cheap'. Writing it yourself.
qhoxie将近 17 年前
Libraries like mechanize and hpricot are shrinking the curve for scraping tasks. That's not to say it is easy, but it should not take a bunch of people working on it. One good developer with proper experience would be ample in my opinion.
olefoo将近 17 年前
Or get one expensive guy to write you a script that<p>writes the scripts to scrape the sites by scraping the sites to read the structure to write the scripts to scrape the sites.
Anon84将近 17 年前
Check out the search.wikia.org project. They make their crawler (and crawl data) available. Maybe you can get away with using theirs. That would <i>really</i> be cheap!
gaius将近 17 年前
I am unable to think of an application for this technology other than spamming. Care to provide more details before we shoot ourselves in the face by helping you?