科技回声

About the data:- DB Size: 543 million rows- Data Size: 173GB (uncompressed)- Stored in mysql- 200+ Million tweets from 13+ Million users- Collected in 1 week- Operation costs: 100+ dollars- Rackspace Cloud - 1 CentOS 8GB Ram server- Java, memcache, mysql and perl for core processing- js, php for analytics & visualization* Download the data at this url http://www.archive.org/details/2011-06-calufa-twitter-sql

14 条评论

sethish将近 14 年前

Twitter changed their ToS to explicitly disallow distributing twitter dumps like this: <a href="http://chronicle.com/blogs/profhacker/the-end-of-twapperkeeper-and-what-to-do-about-it/31582" rel="nofollow">http://chronicle.com/blogs/profhacker/the-end-of-twapperkeep...</a>I was a part of the webecology project (and 140kit.com, both of which gave large twitter datasets to researchers.

评论 #2633884 未加载

评论 #2635987 未加载

评论 #2634051 未加载

评论 #2635993 未加载

jdvolz将近 14 年前

Calufa, next time you're in Vegas, send me a message and we'll get a beer. Thank you. You just made something I'm doing vastly more awesome.

评论 #2634398 未加载

StavrosK将近 14 年前

Torrent here, when done: <a href="http://burnbit.com/torrent/170493/twitter_sql_bz2" rel="nofollow">http://burnbit.com/torrent/170493/twitter_sql_bz2</a>

calufa将近 14 年前

import to mysql:bunzip2 < my_database.sql.bz2 | mysql -h localhost -u root -p my_database

评论 #2634072 未加载

aonic将近 14 年前

Thanks! More interested in the scraper.. is it open-source? If yes, where can we download it? If not, can you write about your experience in building it?

评论 #2634041 未加载

评论 #2633799 未加载

JeeyoungKim将近 14 年前

Hey guys, what would be the most sane way to work with this dataset? If it's 173GB, it's probably hard to load it up in a single machine.

ck2将近 14 年前

Hmm, how many days back does it go?Twitter search still only goes back 10 days in 2011, so how deep is this data?

评论 #2633567 未加载

laprise将近 14 年前

Neat ! here some tips for creating a kick ass graph visualization: <a href="http://www.martinlaprise.info/2010/02/15/visualize-your-own-twitter-graph-part-2/" rel="nofollow">http://www.martinlaprise.info/2010/02/15/visualize-your-own-...</a>

nametoremember将近 14 年前

Damn, I just saw this. I would have liked to use it. How can Twitter make you take it down when it is all public information anyway?

calufa将近 14 年前

A EMAIL FROM TWITTER KILLED THE DATASET --- :S

评论 #2634817 未加载

JeeyoungKim将近 14 年前

Does anybody want to share MD5 hash of the file? I'm trying to decompress this file, and I'm keep getting an error.

评论 #2652026 未加载

juiceandjuice将近 14 年前

Wow, I just downloaded that whole archive in a minute.

评论 #2634342 未加载

8maki将近 14 年前

Oh it's awesome dump. Are these mainly from US?

chrisjsmith将近 14 年前

All that is meaningless chatter between people and information about bathroom habits. Perhaps if we pooled that distributed effort into something constructive, the world would be a better place.

评论 #2645802 未加载

14 条评论

sethish将近 14 年前

评论 #2633884 未加载

评论 #2635987 未加载

评论 #2634051 未加载

评论 #2635993 未加载

jdvolz将近 14 年前

Calufa, next time you're in Vegas, send me a message and we'll get a beer. Thank you. You just made something I'm doing vastly more awesome.

评论 #2634398 未加载

StavrosK将近 14 年前

Torrent here, when done: <a href="http://burnbit.com/torrent/170493/twitter_sql_bz2" rel="nofollow">http://burnbit.com/torrent/170493/twitter_sql_bz2</a>

calufa将近 14 年前

import to mysql:bunzip2 < my_database.sql.bz2 | mysql -h localhost -u root -p my_database

评论 #2634072 未加载

aonic将近 14 年前

Thanks! More interested in the scraper.. is it open-source? If yes, where can we download it? If not, can you write about your experience in building it?

评论 #2634041 未加载

评论 #2633799 未加载

JeeyoungKim将近 14 年前

Hey guys, what would be the most sane way to work with this dataset? If it's 173GB, it's probably hard to load it up in a single machine.

ck2将近 14 年前

Hmm, how many days back does it go?Twitter search still only goes back 10 days in 2011, so how deep is this data?

评论 #2633567 未加载

laprise将近 14 年前

nametoremember将近 14 年前

Damn, I just saw this. I would have liked to use it. How can Twitter make you take it down when it is all public information anyway?

calufa将近 14 年前

A EMAIL FROM TWITTER KILLED THE DATASET --- :S

评论 #2634817 未加载

JeeyoungKim将近 14 年前

Does anybody want to share MD5 hash of the file? I'm trying to decompress this file, and I'm keep getting an error.

评论 #2652026 未加载

juiceandjuice将近 14 年前

Wow, I just downloaded that whole archive in a minute.

评论 #2634342 未加载

8maki将近 14 年前

Oh it's awesome dump. Are these mainly from US?

chrisjsmith将近 14 年前

All that is meaningless chatter between people and information about bathroom habits. Perhaps if we pooled that distributed effort into something constructive, the world would be a better place.

评论 #2645802 未加载

Free SQL dump with 200 million tweets from 13 million users

14 条评论

Free SQL dump with 200 million tweets from 13 million users

14 条评论