TechEcho

12 comments

sneakover 11 years ago

Obligatory:<a href="https://en.wikipedia.org/wiki/AOL_search_data_leak" rel="nofollow">https://en.wikipedia.org/wiki/AOL_search_data_leak</a><a href="http://techcrunch.com/2006/08/06/aol-proudly-releases-massive-amounts-of-user-search-data/" rel="nofollow">http://techcrunch.com/2006/08/06/aol-proudly-releases-massiv...</a><a href="http://www.nytimes.com/2006/08/09/technology/09aol.html?pagewanted=all" rel="nofollow">http://www.nytimes.com/2006/08/09/technology/09aol.html?page...</a>TL;DR: It's fairly easy to deanonymize datasets like this, provided they are somewhat complete.

评论 #6513686 未加载

评论 #6514626 未加载

dansoover 11 years ago

Jesus Christ. The bulk scraping in violation of the TOS is egregious enough, but redistributing it with a mandate that the researchers get credit? For what, scraping a generous public API?

评论 #6514398 未加载

nicholassmithover 11 years ago

That doesn't look like Foursquare has handed that over. What's the legality of scraping a service for their data in this way?

评论 #6513593 未加载

评论 #6513448 未加载

galapagoover 11 years ago

<a href="http://webcache.googleusercontent.com/search?q=cache:hLI5FqDixY8J:www-users.cs.umn.edu/~sarwat/foursquaredata/+&cd=1&hl=en&ct=clnk" rel="nofollow">http://webcache.googleusercontent.com/search?q=cache:hLI5FqD...</a>(the direct link is not working, but this confirmed that was freely available)

bootheadover 11 years ago

No mention of the data format. Is it json, csv what? I know you can always head -n the file but a little hint would be helpful!

评论 #6513537 未加载

interskhover 11 years ago

> This data set contains 2153471 users, 1143092 venues, 1021970 check-ins, 27098490 social connections, and 2809581 ratings that users assigned to venuesThe number of check-ins seems to be low compared to other numbers.

davidmatover 11 years ago

Could anyone recommend some solid introductory material on data analysis/data visualisation?I'm thinking this data set seems like a fun way to fill a rainy weekend, going for a dive into these worlds :)

评论 #6514416 未加载

m4tthumphreyover 11 years ago

Look's like it's been removed. Damn.Edit: Not removed, just unaccessible. 403.

评论 #6514609 未加载

renownedmediaover 11 years ago

Looks like the data is only up-to-date as of July 2012 (judging from the zip compression times).

xntrkover 11 years ago

sounded too good to be true. I guess we'll have to find it on bittorrent.

评论 #6515711 未加载

rajbalaover 11 years ago

The data set has been removed?

评论 #6519395 未加载

waynesonfireover 11 years ago

why was this not posted as a torrent?

12 comments

sneakover 11 years ago

评论 #6513686 未加载

评论 #6514626 未加载

dansoover 11 years ago

Jesus Christ. The bulk scraping in violation of the TOS is egregious enough, but redistributing it with a mandate that the researchers get credit? For what, scraping a generous public API?

评论 #6514398 未加载

nicholassmithover 11 years ago

That doesn't look like Foursquare has handed that over. What's the legality of scraping a service for their data in this way?

评论 #6513593 未加载

评论 #6513448 未加载

galapagoover 11 years ago

bootheadover 11 years ago

No mention of the data format. Is it json, csv what? I know you can always head -n the file but a little hint would be helpful!

评论 #6513537 未加载

interskhover 11 years ago

davidmatover 11 years ago

评论 #6514416 未加载

m4tthumphreyover 11 years ago

Look's like it's been removed. Damn.Edit: Not removed, just unaccessible. 403.

评论 #6514609 未加载

renownedmediaover 11 years ago

Looks like the data is only up-to-date as of July 2012 (judging from the zip compression times).

xntrkover 11 years ago

sounded too good to be true. I guess we'll have to find it on bittorrent.

评论 #6515711 未加载

rajbalaover 11 years ago

The data set has been removed?

评论 #6519395 未加载

waynesonfireover 11 years ago

why was this not posted as a torrent?

Foursquare dataset free to download and analyze

12 comments

Foursquare dataset free to download and analyze

12 comments