TechEcho

hafabnewalmost 13 years ago

From the docs:'''* Node.js [...]* jQuery [...][...]This approach has become my hammer when web scraping tasks come up.'''If all you have is a hammer, you may find yourself noticing that objects become more nail-like :).

wskinneralmost 13 years ago

I have also found node+jQuery an effective web crawling combination. In particular the cheerio library <a href="https://github.com/MatthewMueller/cheerio" rel="nofollow">https://github.com/MatthewMueller/cheerio</a> greatly simplifies data extraction. And as others have mentioned, the asynchronous nature of node is perfectly suited to crawling (as long as you take care not to accidentally DDOS the target site).

latchkeyalmost 13 years ago

If you really want to scrape pages, you should use something like <a href="https://github.com/chriso/node.io/" rel="nofollow">https://github.com/chriso/node.io/</a> which batches things in jobs, helps with error handling, io, etc...

blyxaalmost 13 years ago

why not use the twitter api?

评论 #4343252 未加载

评论 #4343400 未加载

评论 #4343582 未加载

评论 #4343272 未加载

dansoalmost 13 years ago

Does Node have anything like Mechanize? Handling cookie state and such is something that is much more useful than the selector functionality of jQuery...which is great, but not any better than what Nokogiri offers.

评论 #4343884 未加载

评论 #4343425 未加载

Using Node.js and JQuery to Crawl Public Tweets

5 comments

Using Node.js and JQuery to Crawl Public Tweets

5 comments