TechEcho

4 comments

simonwover 14 years ago

I was intrigued to see what CSS selector engine it was using...<a href="https://github.com/chriso/node.io" rel="nofollow">https://github.com/chriso/node.io</a> uses <a href="https://github.com/harryf/node-soupselect" rel="nofollow">https://github.com/harryf/node-soupselect</a><a href="https://github.com/harryf/node-soupselect" rel="nofollow">https://github.com/harryf/node-soupselect</a> is a port of my <a href="https://github.com/simonw/soupselect" rel="nofollow">https://github.com/simonw/soupselect</a> library for Python<a href="https://github.com/simonw/soupselect" rel="nofollow">https://github.com/simonw/soupselect</a> is a port of my getElementsBySelector function for JavaScript: <a href="http://simonwillison.net/2003/Mar/25/getElementsBySelector/" rel="nofollow">http://simonwillison.net/2003/Mar/25/getElementsBySelector/</a>I'm always surprised to see that code still being used - it's the least complete selector library out there by a long way.

评论 #2132172 未加载

评论 #2132018 未加载

marcusrambergover 14 years ago

<a href="http://mojolicio.us" rel="nofollow">http://mojolicio.us</a> is way better for this kind of stuff. Here's the synopsis example redone using Mojo:<pre><code> $ perl -Mojo -e'g("reddit.com")->dom("a.title")->each(sub { warn shift->text })'</code></pre>

评论 #2132219 未加载

thibaut_barrereover 14 years ago

Really interesting, thanks! This will probably the first thing I will use for real projects in node.js.Does anyone knows how it compares to say Nokogiri or Hpricot, both in terms of speed and in terms of ability to handle crappy html ?

chrisoharaover 14 years ago

This is in response to all the node/jsdom/jquery scraping posts that are popular lately. JSDom is hopeless for scraping - try parsing some slightly malformed HTML..

评论 #2132076 未加载

评论 #2132025 未加载

评论 #2131985 未加载

4 comments

simonwover 14 years ago

评论 #2132172 未加载

评论 #2132018 未加载

marcusrambergover 14 years ago

评论 #2132219 未加载

thibaut_barrereover 14 years ago

chrisoharaover 14 years ago

This is in response to all the node/jsdom/jquery scraping posts that are popular lately. JSDom is hopeless for scraping - try parsing some slightly malformed HTML..

评论 #2132076 未加载

评论 #2132025 未加载

评论 #2131985 未加载

Pro scraping with Node.JS

4 comments

Pro scraping with Node.JS

4 comments