Before the App Store hit and iPhone web applications were all the rage, I started working on a restaurant locator. Oddity Software was a company I came across that provides datasets like the ones from AggData, though I'm not sure if it's from scraping the web. I figured I'd give it a mention in case people came here searching for additional resources. They definitely have more in the way of free lists (<a href="http://www.odditysoftware.com/free_lists.html" rel="nofollow">http://www.odditysoftware.com/free_lists.html</a>), though I can't personally vouch for accuracy or timely updates as I haven't used them.<p>Listable (<a href="http://www.listable.org" rel="nofollow">http://www.listable.org</a>) is another list type service, though it's lists are much less complex and are user created.<p>I'll be adding AggData to my bookmarks, though. I could see myself using at least one of their "FreeData" lists in the future and possible some of their paid ones.
Wow, this discussion is way deeper than we have ever gotten into at AggData. In fact, "frig", I think we may need to hire you. :) We have been very particular in the type of data we collect for some of these very reasons, and we feel that the location data was enough in the public domain to protect us from infringement allegations. We don't currently have much in place to pursue those trying to resell our data, and it hasn't really been a problem yet. I think, like mentioned, it doesn't make much economical sense.<p>A couple of other quick responses: yes, we know our search is kind of lacking now, and we're working to fix it. Also, we have major plans of offering bulk data and specific regional data; we're currently just working on expanding our library, though.<p>Thank you, everyone, for your insight!
-Chris Hathaway, AggData LLC<p>(and seriously, frig, send us a message on our contact page, I have more questions for you)
Hey, AggData guys, why not change business model and sell your data in bulks? Wouldn't be nice to use it that way?<p>from aggdata.dealership_locations import cadillac<p>print "Cadillac Dealers in NY:"<p>for loc in cadillac:<p><pre><code> if loc.city == 'New York':
print loc.address, loc.phonenumber</code></pre>
Their "FreeData" sets could use some attention.<p>I realize it is <i>free</i>, but if you are going to have it as an example of what you do, have it correct and up to date.<p>The headers for the congress data are completely off, and is not current.. Franken, Kennedy.. etc..
If you are interested in data, here are some sites to get them from<p><a href="http://theinfo.org/get/data" rel="nofollow">http://theinfo.org/get/data</a><p><a href="http://infochimps.org" rel="nofollow">http://infochimps.org</a><p><a href="http://developer.amazonwebservices.com/connect/kbcategory.jspa?categoryID=243" rel="nofollow">http://developer.amazonwebservices.com/connect/kbcategory.js...</a><p><a href="http://ckan.net" rel="nofollow">http://ckan.net</a><p>EDIT: Comprehensive list here
<a href="http://www.datawrangling.com/some-datasets-available-on-the-web" rel="nofollow">http://www.datawrangling.com/some-datasets-available-on-the-...</a>
I have looked into making a business like this before, there are quite a few of them and I do like scraping.<p>But don't you have to break a lot of 'terms of use' agreements to scrape this data? Could you get in legal trouble for that?
Only "Locations" kinds of data? And the search is awful- it couldn't find anything for McDonalds for example (<a href="http://aggdata.com/search/node/McDonalds" rel="nofollow">http://aggdata.com/search/node/McDonalds</a>)<p>I was hoping to use it as a possible alternative to <a href="http://archive.ics.uci.edu/ml/" rel="nofollow">http://archive.ics.uci.edu/ml/</a> for ML data sets but now I am kind of disappointed.