TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Data sets released by Google

170 pointsby supoover 11 years ago

13 comments

aidanfover 11 years ago
If you want to play around with data, here&#x27;s another good list of open&#x2F;free datasets: <a href="http://bitly.com/bundles/hmason/1" rel="nofollow">http:&#x2F;&#x2F;bitly.com&#x2F;bundles&#x2F;hmason&#x2F;1</a>
评论 #6456290 未加载
gtaniover 11 years ago
here&#x27;s some other data hubs&#x2F;search engines, endless lists:<p><a href="http://datahub.io/" rel="nofollow">http:&#x2F;&#x2F;datahub.io&#x2F;</a><p><a href="http://blog.bigml.com/2013/02/28/data-data-data-thousands-of-public-data-sources/" rel="nofollow">http:&#x2F;&#x2F;blog.bigml.com&#x2F;2013&#x2F;02&#x2F;28&#x2F;data-data-data-thousands-of...</a><p><a href="http://tm.durusau.net/?p=39312" rel="nofollow">http:&#x2F;&#x2F;tm.durusau.net&#x2F;?p=39312</a><p><a href="http://dvn.iq.harvard.edu/dvn/" rel="nofollow">http:&#x2F;&#x2F;dvn.iq.harvard.edu&#x2F;dvn&#x2F;</a><p>_____________<p>this subreddit seems like a decent place to ask questions<p><a href="http://www.reddit.com/r/datasets" rel="nofollow">http:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;datasets</a>
imurrayover 11 years ago
Another one from Google, 1000 scanned books for OCR and other scanned document processing research: <a href="http://commondatastorage.googleapis.com/books/icdar2007/README.txt" rel="nofollow">http:&#x2F;&#x2F;commondatastorage.googleapis.com&#x2F;books&#x2F;icdar2007&#x2F;READ...</a>
thangalinover 11 years ago
<a href="http://bitly.com/bundles/hmason/1" rel="nofollow">http:&#x2F;&#x2F;bitly.com&#x2F;bundles&#x2F;hmason&#x2F;1</a><p><a href="http://commondatastorage.googleapis.com/books/icdar2007/README.txt" rel="nofollow">http:&#x2F;&#x2F;commondatastorage.googleapis.com&#x2F;books&#x2F;icdar2007&#x2F;READ...</a><p><a href="http://datahub.io/" rel="nofollow">http:&#x2F;&#x2F;datahub.io&#x2F;</a><p><a href="http://blog.bigml.com/2013/02/28/data-data-data-thousands-of-public-data-sources/" rel="nofollow">http:&#x2F;&#x2F;blog.bigml.com&#x2F;2013&#x2F;02&#x2F;28&#x2F;data-data-data-thousands-of...</a><p><a href="http://iatiregistry.org/" rel="nofollow">http:&#x2F;&#x2F;iatiregistry.org&#x2F;</a><p><a href="http://open.undp.org/" rel="nofollow">http:&#x2F;&#x2F;open.undp.org&#x2F;</a><p><a href="http://data.worldbank.org/" rel="nofollow">http:&#x2F;&#x2F;data.worldbank.org&#x2F;</a><p><a href="https://explore.data.gov/catalog/raw/" rel="nofollow">https:&#x2F;&#x2F;explore.data.gov&#x2F;catalog&#x2F;raw&#x2F;</a><p><a href="http://www.data.gov/opendatasites" rel="nofollow">http:&#x2F;&#x2F;www.data.gov&#x2F;opendatasites</a><p><a href="http://data.gov.be/datasets" rel="nofollow">http:&#x2F;&#x2F;data.gov.be&#x2F;datasets</a><p><a href="http://opencorporates.com/" rel="nofollow">http:&#x2F;&#x2F;opencorporates.com&#x2F;</a><p><a href="http://glasspockets.org/work/reportingcommitment/api.html" rel="nofollow">http:&#x2F;&#x2F;glasspockets.org&#x2F;work&#x2F;reportingcommitment&#x2F;api.html</a><p><a href="http://thedata.harvard.edu/dvn/" rel="nofollow">http:&#x2F;&#x2F;thedata.harvard.edu&#x2F;dvn&#x2F;</a><p><a href="http://www.reddit.com/r/datasets" rel="nofollow">http:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;datasets</a><p><a href="http://archive.ics.uci.edu/ml/" rel="nofollow">http:&#x2F;&#x2F;archive.ics.uci.edu&#x2F;ml&#x2F;</a><p><a href="http://cleandatahub.org/" rel="nofollow">http:&#x2F;&#x2F;cleandatahub.org&#x2F;</a><p><a href="http://datacatalogs.org/" rel="nofollow">http:&#x2F;&#x2F;datacatalogs.org&#x2F;</a><p><a href="http://archive.org/details/oxford-2005-facebook-matrix" rel="nofollow">http:&#x2F;&#x2F;archive.org&#x2F;details&#x2F;oxford-2005-facebook-matrix</a>
X4over 11 years ago
BitTorrent Please! Why does it cost so much? They grabbed our data for free and they have enough free Bandwidth. Let&#x27;s assume they are greedy, then they could at least offer it through BitTorrent. DVD&#x27;s for that amount of data is ridiculous. I don&#x27;t even have a DVD-Reader…<p><i></i><i>Can&#x27;t afford buying all that + shipping to Europe, but would like to play with the Data for my NLP Project.</i><i></i>
评论 #6461683 未加载
agibsoncccover 11 years ago
Here&#x27;s another good one.<p><a href="http://archive.ics.uci.edu/ml/" rel="nofollow">http:&#x2F;&#x2F;archive.ics.uci.edu&#x2F;ml&#x2F;</a>
avidasover 11 years ago
Here is a good one, <a href="http://cleandatahub.org/" rel="nofollow">http:&#x2F;&#x2F;cleandatahub.org&#x2F;</a> They are trying to aggregate cleaned data sets across the web.
PaulHouleover 11 years ago
no links...<p>Remember the days when people used to make links on the web because they weren&#x27;t greedy with their pagerank?<p>At least Google left us some machine learning data sets after they took all the links. You just can&#x27;t find them because nobody links to them.
评论 #6456640 未加载
ChikkaChiChiover 11 years ago
Fantastic links throughout this thread.<p>When playing with new programming languages instead of a &#x27;todo&#x27; list I always end up building an XKCD password generator. Interestingly enough, I&#x27;ve never found a frequency&#x2F;comprehension list worth using to populate it for public consumption.
option_greekover 11 years ago
Is there any data set that embodies human relationships with every day objects ?
ma2rtenover 11 years ago
Also:<p><a href="https://code.google.com/p/word2vec/#Pre-trained_entity_vectors_with_Freebase_naming" rel="nofollow">https:&#x2F;&#x2F;code.google.com&#x2F;p&#x2F;word2vec&#x2F;#Pre-trained_entity_vecto...</a>
kineticfocusover 11 years ago
The ML competition site Kaggle should also get a mention here. <a href="http://www.kaggle.com/competitions" rel="nofollow">http:&#x2F;&#x2F;www.kaggle.com&#x2F;competitions</a>
chatmanover 11 years ago
Where is the Web1T dataset? Would you not consider it useful for Machine Learning?
评论 #6478027 未加载