Obligatory:<p><a href="https://en.wikipedia.org/wiki/AOL_search_data_leak" rel="nofollow">https://en.wikipedia.org/wiki/AOL_search_data_leak</a><p><a href="http://techcrunch.com/2006/08/06/aol-proudly-releases-massive-amounts-of-user-search-data/" rel="nofollow">http://techcrunch.com/2006/08/06/aol-proudly-releases-massiv...</a><p><a href="http://www.nytimes.com/2006/08/09/technology/09aol.html?pagewanted=all" rel="nofollow">http://www.nytimes.com/2006/08/09/technology/09aol.html?page...</a><p>TL;DR: It's fairly easy to deanonymize datasets like this, provided they are somewhat complete.
Jesus Christ. The bulk scraping in violation of the TOS is egregious enough, but redistributing it with a mandate that the researchers get credit? For what, scraping a generous public API?
<a href="http://webcache.googleusercontent.com/search?q=cache:hLI5FqDixY8J:www-users.cs.umn.edu/~sarwat/foursquaredata/+&cd=1&hl=en&ct=clnk" rel="nofollow">http://webcache.googleusercontent.com/search?q=cache:hLI5FqD...</a><p>(the direct link is not working, but this confirmed that was freely available)
> This data set contains 2153471 users, 1143092 venues, 1021970 check-ins, 27098490 social connections, and 2809581 ratings that users assigned to venues<p>The number of check-ins seems to be low compared to other numbers.
Could anyone recommend some solid introductory material on data analysis/data visualisation?<p>I'm thinking this data set seems like a fun way to fill a rainy weekend, going for a dive into these worlds :)