TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Web Scraping with Node.js and Chimera

98 pointsby dandrewsenover 12 years ago

11 comments

kanzureover 12 years ago
Great project. The biggest question for me when I'm using phantomjs is why phantomjs is trying to replicate nodejs infrastructure. For example, phantomjs has an HTTP server feature for processing incoming requests. This doesn't make sense to me because a browser shouldn't be a server. If you need to get information out of the worker, you should POST it somewhere. The proclivity of phantomjs users to prefer stdout is astounding. It's definitely the #1 question or issue that I get fielded in #phantomjs on freenode.<p>For example, for POSTing and reading from redis/resque I wrote this (proof of concept, not what's in production):<p><a href="https://gist.github.com/000037f472b72d9490a6" rel="nofollow">https://gist.github.com/000037f472b72d9490a6</a><p>A few thoughts..<p><pre><code> &#62; There are similar "glues" like phantomjs-node that integrate phantomjs by &#62; spawning a process, and processing the stdout stream, but it is limited by &#62; what can be done via the command line of phantomjs. If you really want direct &#62; api access to the browser, the best way is via direct integration. </code></pre> This seems like a lot of overhead on top of a phantomjs (or even just a generic webkit) worker. Substack's approach was to just put a proxy in front of a browser that injects a &#60;script&#62; tag into the page to boss the browser around:<p><a href="https://github.com/substack/schoolbus" rel="nofollow">https://github.com/substack/schoolbus</a><p>Supposedly the actual browser client shouldn't matter, as long as your fleet of workers are up and running. I bet chimera's approach will end up with more access to npm modules in the long run compared to phantomjs.<p>Also, the link wasn't in the article: <a href="https://github.com/deanmao/node-chimera" rel="nofollow">https://github.com/deanmao/node-chimera</a><p>For the python equivalent of this project, there's <a href="https://github.com/kanzure/pyphantomjs" rel="nofollow">https://github.com/kanzure/pyphantomjs</a>
nigglerover 12 years ago
Did we really reach the point where demonstration code can be presented in coffeescript without an equivalent javascript demo?
评论 #5000643 未加载
评论 #5001156 未加载
fruchtoseover 12 years ago
Great work! This might even have potential for browser-based testing, since mocha-phantomjs runs from an executable; I'd prefer a code-based solution like Chimera integrate with Mocha.
lancefisherover 12 years ago
This is a great idea! phantomjs-node works okay, but it is suck a hack. A nifty hack, but still. <a href="https://github.com/sgentle/phantomjs-node#how-does-it-work" rel="nofollow">https://github.com/sgentle/phantomjs-node#how-does-it-work</a><p>If you want to parse the DOM for the internet at large, you need a real browser. There are simply too many sites with really bad HTML to be parsed reliably with anything else.
评论 #5001182 未加载
评论 #5001284 未加载
seanlinehanover 12 years ago
This looks really great. I would love to see a bit more of an in-depth example in a follow-up post!<p>Is more documentation to come?
评论 #5001151 未加载
Trindazover 12 years ago
Has anyone actually gotten this working? I've tried installing on Mac OS X and Ubuntu, both with various problems. The precompiled binaries don't work, the qt build scripts fail, etc. etc.
rco8786over 12 years ago
Is there more documentation(or source) available somewhere?
评论 #5000575 未加载
评论 #5003175 未加载
mcantelonover 12 years ago
Similar project: <a href="https://github.com/LearnBoost/tobi" rel="nofollow">https://github.com/LearnBoost/tobi</a>
评论 #5000533 未加载
nodemakerover 12 years ago
Testing (<a href="http://bartaz.github.com/impress.js/#/bored" rel="nofollow">http://bartaz.github.com/impress.js/#/bored</a>)
goldfeldover 12 years ago
How does it differ from ZombieJS?
评论 #5000828 未加载
boozover 12 years ago
wow this is exactly what I need, thanks!