TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Free SQL dump with 200 million tweets from 13 million users

98 点作者 _hfqa将近 14 年前
About the data:<p>- DB Size: 543 million rows<p>- Data Size: 173GB (uncompressed)<p>- Stored in mysql<p>- 200+ Million tweets from 13+ Million users<p>- Collected in 1 week<p>- Operation costs: 100+ dollars<p>- Rackspace Cloud - 1 CentOS 8GB Ram server<p>- Java, memcache, mysql and perl for core processing<p>- js, php for analytics &#38; visualization<p><i></i>* Download the data at this url http://www.archive.org/details/2011-06-calufa-twitter-sql

14 条评论

sethish将近 14 年前
Twitter changed their ToS to explicitly disallow distributing twitter dumps like this: <a href="http://chronicle.com/blogs/profhacker/the-end-of-twapperkeeper-and-what-to-do-about-it/31582" rel="nofollow">http://chronicle.com/blogs/profhacker/the-end-of-twapperkeep...</a><p>I was a part of the webecology project (and 140kit.com, both of which gave large twitter datasets to researchers.
评论 #2633884 未加载
评论 #2635987 未加载
评论 #2634051 未加载
评论 #2635993 未加载
jdvolz将近 14 年前
Calufa, next time you're in Vegas, send me a message and we'll get a beer. Thank you. You just made something I'm doing vastly more awesome.
评论 #2634398 未加载
StavrosK将近 14 年前
Torrent here, when done: <a href="http://burnbit.com/torrent/170493/twitter_sql_bz2" rel="nofollow">http://burnbit.com/torrent/170493/twitter_sql_bz2</a>
calufa将近 14 年前
import to mysql:<p>bunzip2 &#60; my_database.sql.bz2 | mysql -h localhost -u root -p my_database
评论 #2634072 未加载
aonic将近 14 年前
Thanks! More interested in the scraper.. is it open-source? If yes, where can we download it? If not, can you write about your experience in building it?
评论 #2634041 未加载
评论 #2633799 未加载
JeeyoungKim将近 14 年前
Hey guys, what would be the most sane way to work with this dataset? If it's 173GB, it's probably hard to load it up in a single machine.
ck2将近 14 年前
Hmm, how many days back does it go?<p>Twitter search still only goes back 10 days in 2011, so how deep is this data?
评论 #2633567 未加载
laprise将近 14 年前
Neat ! here some tips for creating a kick ass graph visualization: <a href="http://www.martinlaprise.info/2010/02/15/visualize-your-own-twitter-graph-part-2/" rel="nofollow">http://www.martinlaprise.info/2010/02/15/visualize-your-own-...</a>
nametoremember将近 14 年前
Damn, I just saw this. I would have liked to use it. How can Twitter make you take it down when it is all public information anyway?
calufa将近 14 年前
A EMAIL FROM TWITTER KILLED THE DATASET --- :S
评论 #2634817 未加载
JeeyoungKim将近 14 年前
Does anybody want to share MD5 hash of the file? I'm trying to decompress this file, and I'm keep getting an error.
评论 #2652026 未加载
juiceandjuice将近 14 年前
Wow, I just downloaded that whole archive in a minute.
评论 #2634342 未加载
8maki将近 14 年前
Oh it's awesome dump. Are these mainly from US?
chrisjsmith将近 14 年前
All that is meaningless chatter between people and information about bathroom habits. Perhaps if we pooled that distributed effort into something constructive, the world would be a better place.
评论 #2645802 未加载