TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Open source focused crawler?

6 点作者 cookerware超过 11 年前
Is there an open source crawler&#x2F;library that will recursively follow only links under a certain xpath and ignore the rest?<p>I don&#x27;t want to do an exhaustive crawl of every single link, I want something that will only follow links under a main content area.

3 条评论

sheraz超过 11 年前
I highly recommend Scrapy (<a href="http://www.scrapy.org" rel="nofollow">http:&#x2F;&#x2F;www.scrapy.org</a>).<p>From their site:<p>Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.
techaddict009超过 11 年前
Check this out : <a href="http://commoncrawl.org/" rel="nofollow">http:&#x2F;&#x2F;commoncrawl.org&#x2F;</a><p>Its not exactly what you are looking for but might help you.
forkrulassail超过 11 年前
Have you tried BeautifulSoup?