TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

AggData: Datasets created from scraping the web

62 pointsby adamhowellover 15 years ago

7 comments

weirdwesover 15 years ago
Before the App Store hit and iPhone web applications were all the rage, I started working on a restaurant locator. Oddity Software was a company I came across that provides datasets like the ones from AggData, though I'm not sure if it's from scraping the web. I figured I'd give it a mention in case people came here searching for additional resources. They definitely have more in the way of free lists (<a href="http://www.odditysoftware.com/free_lists.html" rel="nofollow">http://www.odditysoftware.com/free_lists.html</a>), though I can't personally vouch for accuracy or timely updates as I haven't used them.<p>Listable (<a href="http://www.listable.org" rel="nofollow">http://www.listable.org</a>) is another list type service, though it's lists are much less complex and are user created.<p>I'll be adding AggData to my bookmarks, though. I could see myself using at least one of their "FreeData" lists in the future and possible some of their paid ones.
aggdataover 15 years ago
Wow, this discussion is way deeper than we have ever gotten into at AggData. In fact, "frig", I think we may need to hire you. :) We have been very particular in the type of data we collect for some of these very reasons, and we feel that the location data was enough in the public domain to protect us from infringement allegations. We don't currently have much in place to pursue those trying to resell our data, and it hasn't really been a problem yet. I think, like mentioned, it doesn't make much economical sense.<p>A couple of other quick responses: yes, we know our search is kind of lacking now, and we're working to fix it. Also, we have major plans of offering bulk data and specific regional data; we're currently just working on expanding our library, though.<p>Thank you, everyone, for your insight! -Chris Hathaway, AggData LLC<p>(and seriously, frig, send us a message on our contact page, I have more questions for you)
toppyover 15 years ago
Hey, AggData guys, why not change business model and sell your data in bulks? Wouldn't be nice to use it that way?<p>from aggdata.dealership_locations import cadillac<p>print "Cadillac Dealers in NY:"<p>for loc in cadillac:<p><pre><code> if loc.city == 'New York': print loc.address, loc.phonenumber</code></pre>
timmaahover 15 years ago
Their "FreeData" sets could use some attention.<p>I realize it is <i>free</i>, but if you are going to have it as an example of what you do, have it correct and up to date.<p>The headers for the congress data are completely off, and is not current.. Franken, Kennedy.. etc..
vijayrover 15 years ago
If you are interested in data, here are some sites to get them from<p><a href="http://theinfo.org/get/data" rel="nofollow">http://theinfo.org/get/data</a><p><a href="http://infochimps.org" rel="nofollow">http://infochimps.org</a><p><a href="http://developer.amazonwebservices.com/connect/kbcategory.jspa?categoryID=243" rel="nofollow">http://developer.amazonwebservices.com/connect/kbcategory.js...</a><p><a href="http://ckan.net" rel="nofollow">http://ckan.net</a><p>EDIT: Comprehensive list here <a href="http://www.datawrangling.com/some-datasets-available-on-the-web" rel="nofollow">http://www.datawrangling.com/some-datasets-available-on-the-...</a>
utnickover 15 years ago
I have looked into making a business like this before, there are quite a few of them and I do like scraping.<p>But don't you have to break a lot of 'terms of use' agreements to scrape this data? Could you get in legal trouble for that?
评论 #841228 未加载
评论 #841220 未加载
mitkoover 15 years ago
Only "Locations" kinds of data? And the search is awful- it couldn't find anything for McDonalds for example (<a href="http://aggdata.com/search/node/McDonalds" rel="nofollow">http://aggdata.com/search/node/McDonalds</a>)<p>I was hoping to use it as a possible alternative to <a href="http://archive.ics.uci.edu/ml/" rel="nofollow">http://archive.ics.uci.edu/ml/</a> for ML data sets but now I am kind of disappointed.
评论 #841381 未加载