科技回声

13 条评论

brianzelip将近 11 年前

It's unclear to me how to actually run this. Only executing the two commands listed under the Installing section does not run it - I had to `cd` into the scraperjs dir, then `npm install`, then continue with the second Install command (`grunt test`) to actually test.Also, do you install scraperjs into each project directory you want to use it for? Or just install it once?

评论 #8193829 未加载

jasode将近 11 年前

It would be helpful if the documentation compared how Scraperjs is different from, or better than, CasperJS for scraping. CasperJS is the older and more well-known wrapper around PhantomJS so comparisons would help people decide what the appropriate tool would be.<a href="http://casperjs.org/" rel="nofollow">http://casperjs.org/</a>

评论 #8193435 未加载

评论 #8192720 未加载

评论 #8193527 未加载

评论 #8192528 未加载

halcyondaze将近 11 年前

If you're interested in scraping in python, then I recommend giving this a read: <a href="http://jakeaustwick.me/python-web-scraping-resource/" rel="nofollow">http://jakeaustwick.me/python-web-scraping-resource/</a>

评论 #8193602 未加载

评论 #8195189 未加载

justboxing将近 11 年前

This is awesome. I am very new to scraping, so bear with me if this is very obvious.Would it be possible to follow a list of URLs from a home page (Ex: List of Marathon Runners), and then follow the link in their name that goes to their stats page, and download / save the scraped data as JSON to a text file on the local machine's C:\Runners\Data\ folder for example?Also, does anyone know of a reliable and tested C# / .Net / ASP.Net web page scrapper?

评论 #8193330 未加载

评论 #8195630 未加载

评论 #8193855 未加载

jdrock将近 11 年前

Let us know if you'd like to integrate this with <a href="http://www.80legs.com" rel="nofollow">http://www.80legs.com</a>!

评论 #8193727 未加载

andrejewski将近 11 年前

If anyone is interested in just scrapping links between webpages with JavaScript, I made Slinky (<a href="https://github.com/andrejewski/slinky" rel="nofollow">https://github.com/andrejewski/slinky</a>). The API is simple and easily overridable.

pibefision将近 11 年前

Could someone recommend a similar framework but Ruby based? Just because I'm more skilled in Ruby than in Node (not for trolling purposes)I've been exploring Github but could not find a well mantained framework (or at least updated to last month).

评论 #8192882 未加载

评论 #8192870 未加载

评论 #8193554 未加载

评论 #8192872 未加载

jwarren将近 11 年前

Nice! Could've used that this weekend when I got caught in callback hell trying to build a simple NodeJS scraper. Ended up doing it in PHP just because I know it well.I'll give it another go with this library next week!

roux_rc将近 11 年前

Artoo is soooo much better :) <a href="https://medialab.github.io/artoo/" rel="nofollow">https://medialab.github.io/artoo/</a>

bshimmin将近 11 年前

I really like the router aspect of this. That's a nice idea and not (to the best of my limited memory) one I can recall seeing in any other scraper.

mr5iff将近 11 年前

I don't quite get the point of the DynamicScraper... Any real use cases for that?

评论 #8193522 未加载

评论 #8194057 未加载

评论 #8193959 未加载

novaleaf将近 11 年前

if you want a scraper as service, you can try: <a href="https://PhantomJsCloud.com" rel="nofollow">https://PhantomJsCloud.com</a>disclaimer: i wrote it.

woah将近 11 年前

Looks pretty good, shame about the promises.

13 条评论

brianzelip将近 11 年前

评论 #8193829 未加载

jasode将近 11 年前

评论 #8193435 未加载

评论 #8192720 未加载

评论 #8193527 未加载

评论 #8192528 未加载

halcyondaze将近 11 年前

评论 #8193602 未加载

评论 #8195189 未加载

justboxing将近 11 年前

评论 #8193330 未加载

评论 #8195630 未加载

评论 #8193855 未加载

jdrock将近 11 年前

Let us know if you'd like to integrate this with <a href="http://www.80legs.com" rel="nofollow">http://www.80legs.com</a>!

评论 #8193727 未加载

andrejewski将近 11 年前

pibefision将近 11 年前

评论 #8192882 未加载

评论 #8192870 未加载

评论 #8193554 未加载

评论 #8192872 未加载

jwarren将近 11 年前

roux_rc将近 11 年前

Artoo is soooo much better :) <a href="https://medialab.github.io/artoo/" rel="nofollow">https://medialab.github.io/artoo/</a>

bshimmin将近 11 年前

I really like the router aspect of this. That's a nice idea and not (to the best of my limited memory) one I can recall seeing in any other scraper.

mr5iff将近 11 年前

I don't quite get the point of the DynamicScraper... Any real use cases for that?

评论 #8193522 未加载

评论 #8194057 未加载

评论 #8193959 未加载

novaleaf将近 11 年前

if you want a scraper as service, you can try: <a href="https://PhantomJsCloud.com" rel="nofollow">https://PhantomJsCloud.com</a>disclaimer: i wrote it.

woah将近 11 年前

Looks pretty good, shame about the promises.

Show HN: Scraperjs – A versatile web scraper

13 条评论

Show HN: Scraperjs – A versatile web scraper

13 条评论