TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

How To Get Experience Working With Large Datasets

36 pointsby m3mb3rover 14 years ago

6 comments

drblastover 14 years ago
US Census data is multiple gigabytes, and well documented.<p>If you want to run a database through its paces beyond the point where all the data fits in memory, that's a good place to start.
评论 #1990082 未加载
wrathover 14 years ago
One nice and free dataset which you can play with is the BestBuy open data. You can download the full catalog of products from BestBuy in JSON and XML format. <a href="http://developer.bestbuy.com" rel="nofollow">http://developer.bestbuy.com</a> Simply register for a key and you'll have access to the data.
andrewjshultsover 14 years ago
Along the same lines, NYC's Big Apps 2.0 competition is going on right now (<a href="http://nycbigapps.com/" rel="nofollow">http://nycbigapps.com/</a>). Not affiliated, but I went to NYTM last year where they demoed the winners and there are some interesting (and impressively large) datasets to play with. One of my favorites was the mobile app, CabSense, that crunched the TLC data to determine the best corners to catch a cab on depending on the time of day
fmwover 14 years ago
They might be relatively small, but <a href="http://www.grouplens.org/node/12" rel="nofollow">http://www.grouplens.org/node/12</a> has some interesting datasets that can be used to experiment with recommendation systems, e.g. book and movie reviews.
ashtophoenixover 14 years ago
What a silly article - When it said how to get experience working with large datasets I was expecting it would explain more about storage/scalability/design/caching issues etc. There are myriad ways to get (or generate) data to play with...
earlover 14 years ago
What's with the recent fetishization of Big Data? I'm moving to Dziuba's camp -- its a developer dick size contest.
评论 #1989865 未加载
评论 #1989951 未加载