TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Foursquare dataset free to download and analyze

113 pointsby rsobersover 11 years ago

12 comments

sneakover 11 years ago
Obligatory:<p><a href="https://en.wikipedia.org/wiki/AOL_search_data_leak" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;AOL_search_data_leak</a><p><a href="http://techcrunch.com/2006/08/06/aol-proudly-releases-massive-amounts-of-user-search-data/" rel="nofollow">http:&#x2F;&#x2F;techcrunch.com&#x2F;2006&#x2F;08&#x2F;06&#x2F;aol-proudly-releases-massiv...</a><p><a href="http://www.nytimes.com/2006/08/09/technology/09aol.html?pagewanted=all" rel="nofollow">http:&#x2F;&#x2F;www.nytimes.com&#x2F;2006&#x2F;08&#x2F;09&#x2F;technology&#x2F;09aol.html?page...</a><p>TL;DR: It&#x27;s fairly easy to deanonymize datasets like this, provided they are somewhat complete.
评论 #6513686 未加载
评论 #6514626 未加载
dansoover 11 years ago
Jesus Christ. The bulk scraping in violation of the TOS is egregious enough, but redistributing it with a mandate that the researchers get credit? For what, scraping a generous public API?
评论 #6514398 未加载
nicholassmithover 11 years ago
That doesn&#x27;t look like Foursquare has handed that over. What&#x27;s the legality of scraping a service for their data in this way?
评论 #6513593 未加载
评论 #6513448 未加载
galapagoover 11 years ago
<a href="http://webcache.googleusercontent.com/search?q=cache:hLI5FqDixY8J:www-users.cs.umn.edu/~sarwat/foursquaredata/+&amp;cd=1&amp;hl=en&amp;ct=clnk" rel="nofollow">http:&#x2F;&#x2F;webcache.googleusercontent.com&#x2F;search?q=cache:hLI5FqD...</a><p>(the direct link is not working, but this confirmed that was freely available)
bootheadover 11 years ago
No mention of the data format. Is it json, csv what? I know you can always head -n the file but a little hint would be helpful!
评论 #6513537 未加载
interskhover 11 years ago
&gt; This data set contains 2153471 users, 1143092 venues, 1021970 check-ins, 27098490 social connections, and 2809581 ratings that users assigned to venues<p>The number of check-ins seems to be low compared to other numbers.
davidmatover 11 years ago
Could anyone recommend some solid introductory material on data analysis&#x2F;data visualisation?<p>I&#x27;m thinking this data set seems like a fun way to fill a rainy weekend, going for a dive into these worlds :)
评论 #6514416 未加载
m4tthumphreyover 11 years ago
Look&#x27;s like it&#x27;s been removed. Damn.<p>Edit: Not removed, just unaccessible. 403.
评论 #6514609 未加载
renownedmediaover 11 years ago
Looks like the data is only up-to-date as of July 2012 (judging from the zip compression times).
xntrkover 11 years ago
sounded too good to be true. I guess we&#x27;ll have to find it on bittorrent.
评论 #6515711 未加载
rajbalaover 11 years ago
The data set has been removed?
评论 #6519395 未加载
waynesonfireover 11 years ago
why was this not posted as a torrent?