TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Using Node.js and JQuery to Crawl Public Tweets

35 点作者 BenjaminCoe将近 13 年前

5 条评论

hafabnew将近 13 年前
From the docs:<p>'''<p>* Node.js [...]<p>* jQuery [...]<p>[...]<p>This approach has become my hammer when web scraping tasks come up.<p>'''<p>If all you have is a hammer, you may find yourself noticing that objects become more nail-like :).
wskinner将近 13 年前
I have also found node+jQuery an effective web crawling combination. In particular the cheerio library <a href="https://github.com/MatthewMueller/cheerio" rel="nofollow">https://github.com/MatthewMueller/cheerio</a> greatly simplifies data extraction. And as others have mentioned, the asynchronous nature of node is perfectly suited to crawling (as long as you take care not to accidentally DDOS the target site).
latchkey将近 13 年前
If you really want to scrape pages, you should use something like <a href="https://github.com/chriso/node.io/" rel="nofollow">https://github.com/chriso/node.io/</a> which batches things in jobs, helps with error handling, io, etc...
blyxa将近 13 年前
why not use the twitter api?
评论 #4343252 未加载
评论 #4343400 未加载
评论 #4343582 未加载
评论 #4343272 未加载
danso将近 13 年前
Does Node have anything like Mechanize? Handling cookie state and such is something that is much more useful than the selector functionality of jQuery...which is great, but not any better than what Nokogiri offers.
评论 #4343884 未加载
评论 #4343425 未加载