TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Scraperjs – A versatile web scraper

192 点作者 ruipgil将近 11 年前

13 条评论

brianzelip将近 11 年前
It&#x27;s unclear to me how to actually run this. Only executing the two commands listed under the Installing section does not run it - I had to `cd` into the scraperjs dir, then `npm install`, then continue with the second Install command (`grunt test`) to actually test.<p>Also, do you install scraperjs into each project directory you want to use it for? Or just install it once?
评论 #8193829 未加载
jasode将近 11 年前
It would be helpful if the documentation compared how Scraperjs is different from, or better than, CasperJS for scraping. CasperJS is the older and more well-known wrapper around PhantomJS so comparisons would help people decide what the appropriate tool would be.<p><a href="http://casperjs.org/" rel="nofollow">http:&#x2F;&#x2F;casperjs.org&#x2F;</a>
评论 #8193435 未加载
评论 #8192720 未加载
评论 #8193527 未加载
评论 #8192528 未加载
halcyondaze将近 11 年前
If you&#x27;re interested in scraping in python, then I recommend giving this a read: <a href="http://jakeaustwick.me/python-web-scraping-resource/" rel="nofollow">http:&#x2F;&#x2F;jakeaustwick.me&#x2F;python-web-scraping-resource&#x2F;</a>
评论 #8193602 未加载
评论 #8195189 未加载
justboxing将近 11 年前
This is awesome. I am very new to scraping, so bear with me if this is very obvious.<p>Would it be possible to follow a list of URLs from a home page (Ex: List of Marathon Runners), and then follow the link in their name that goes to their stats page, and download &#x2F; save the scraped data as JSON to a text file on the local machine&#x27;s C:\Runners\Data\ folder for example?<p>Also, does anyone know of a reliable and tested C# &#x2F; .Net &#x2F; ASP.Net web page scrapper?
评论 #8193330 未加载
评论 #8195630 未加载
评论 #8193855 未加载
jdrock将近 11 年前
Let us know if you&#x27;d like to integrate this with <a href="http://www.80legs.com" rel="nofollow">http:&#x2F;&#x2F;www.80legs.com</a>!
评论 #8193727 未加载
andrejewski将近 11 年前
If anyone is interested in just scrapping links between webpages with JavaScript, I made Slinky (<a href="https://github.com/andrejewski/slinky" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;andrejewski&#x2F;slinky</a>). The API is simple and easily overridable.
pibefision将近 11 年前
Could someone recommend a similar framework but Ruby based? Just because I&#x27;m more skilled in Ruby than in Node (not for trolling purposes)<p>I&#x27;ve been exploring Github but could not find a well mantained framework (or at least updated to last month).
评论 #8192882 未加载
评论 #8192870 未加载
评论 #8193554 未加载
评论 #8192872 未加载
jwarren将近 11 年前
Nice! Could&#x27;ve used that this weekend when I got caught in callback hell trying to build a simple NodeJS scraper. Ended up doing it in PHP just because I know it well.<p>I&#x27;ll give it another go with this library next week!
roux_rc将近 11 年前
Artoo is soooo much better :) <a href="https://medialab.github.io/artoo/" rel="nofollow">https:&#x2F;&#x2F;medialab.github.io&#x2F;artoo&#x2F;</a>
bshimmin将近 11 年前
I really like the router aspect of this. That&#x27;s a nice idea and not (to the best of my limited memory) one I can recall seeing in any other scraper.
mr5iff将近 11 年前
I don&#x27;t quite get the point of the DynamicScraper... Any real use cases for that?
评论 #8193522 未加载
评论 #8194057 未加载
评论 #8193959 未加载
novaleaf将近 11 年前
if you want a scraper as service, you can try: <a href="https://PhantomJsCloud.com" rel="nofollow">https:&#x2F;&#x2F;PhantomJsCloud.com</a><p>disclaimer: i wrote it.
woah将近 11 年前
Looks pretty good, shame about the promises.