US Census data is multiple gigabytes, and well documented.<p>If you want to run a database through its paces beyond the point where all the data fits in memory, that's a good place to start.
One nice and free dataset which you can play with is the BestBuy open data. You can download the full catalog of products from BestBuy in JSON and XML format. <a href="http://developer.bestbuy.com" rel="nofollow">http://developer.bestbuy.com</a> Simply register for a key and you'll have access to the data.
Along the same lines, NYC's Big Apps 2.0 competition is going on right now (<a href="http://nycbigapps.com/" rel="nofollow">http://nycbigapps.com/</a>). Not affiliated, but I went to NYTM last year where they demoed the winners and there are some interesting (and impressively large) datasets to play with. One of my favorites was the mobile app, CabSense, that crunched the TLC data to determine the best corners to catch a cab on depending on the time of day
They might be relatively small, but <a href="http://www.grouplens.org/node/12" rel="nofollow">http://www.grouplens.org/node/12</a> has some interesting datasets that can be used to experiment with recommendation systems, e.g. book and movie reviews.
What a silly article - When it said how to get experience working with large datasets I was expecting it would explain more about storage/scalability/design/caching issues etc.
There are myriad ways to get (or generate) data to play with...