TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Using Node.js and JQuery to Crawl Public Tweets

35 pointsby BenjaminCoealmost 13 years ago

5 comments

hafabnewalmost 13 years ago
From the docs:<p>'''<p>* Node.js [...]<p>* jQuery [...]<p>[...]<p>This approach has become my hammer when web scraping tasks come up.<p>'''<p>If all you have is a hammer, you may find yourself noticing that objects become more nail-like :).
wskinneralmost 13 years ago
I have also found node+jQuery an effective web crawling combination. In particular the cheerio library <a href="https://github.com/MatthewMueller/cheerio" rel="nofollow">https://github.com/MatthewMueller/cheerio</a> greatly simplifies data extraction. And as others have mentioned, the asynchronous nature of node is perfectly suited to crawling (as long as you take care not to accidentally DDOS the target site).
latchkeyalmost 13 years ago
If you really want to scrape pages, you should use something like <a href="https://github.com/chriso/node.io/" rel="nofollow">https://github.com/chriso/node.io/</a> which batches things in jobs, helps with error handling, io, etc...
blyxaalmost 13 years ago
why not use the twitter api?
评论 #4343252 未加载
评论 #4343400 未加载
评论 #4343582 未加载
评论 #4343272 未加载
dansoalmost 13 years ago
Does Node have anything like Mechanize? Handling cookie state and such is something that is much more useful than the selector functionality of jQuery...which is great, but not any better than what Nokogiri offers.
评论 #4343884 未加载
评论 #4343425 未加载