TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

List of high-quality open datasets in public domains

388 点作者 Jasamba超过 9 年前

13 条评论

dvcrn超过 9 年前
I don&#x27;t quite understand these awesome lists. From what I&#x27;ve seen it usually ends up being a way for creators to promote their stuff and for the list creator to have a big project with a few thousand stars in their profile. So when I did something in say, electron, I would go to the awesome-electron list and add it there for promotion sake.<p>I couldn&#x27;t find a usecase for these lists myself yet. There is no way to verify the quality of the product or the activity (stars for example? last commit date?).<p>In one case I searched for aws adapters for a language, clicked on all links inside awesome-{{language}} just to find that all of them are inactive or a few days young. I ended up using something I found on google instead.
评论 #11006651 未加载
评论 #11005973 未加载
评论 #11006161 未加载
评论 #11008073 未加载
minimaxir超过 9 年前
69 points, #3 on Hacker News, and no comments? :P<p>This list would be much improved with descriptions for each dataset and indication of schema, as some of the datasets listed have very unfriendly schema. (e.g. the IMDB interfaces link)<p>Kaggle&#x27;s recently-released Public Datasets feature (<a href="https:&#x2F;&#x2F;www.kaggle.com&#x2F;datasets" rel="nofollow">https:&#x2F;&#x2F;www.kaggle.com&#x2F;datasets</a>) provides an interesting approach to presenting data and qualifying datasets by giving good examples of data robustness.
评论 #11004951 未加载
评论 #11005034 未加载
davecap1超过 9 年前
SolveBio (my startup) has parsed, normalized, and indexed a bunch of the datasets listed under biology. Our goal is to make these kinds of datasets easier to access for programmers and non-programmers alike, similar to other some sites mentioned here (Enigma and Quandl) but for genomics. You can query and filter the data on the website or through one of our API clients: <a href="https:&#x2F;&#x2F;www.solvebio.com&#x2F;library" rel="nofollow">https:&#x2F;&#x2F;www.solvebio.com&#x2F;library</a>
评论 #11005741 未加载
discardorama超过 9 年前
The author&#x27;s notion of a &quot;dataset&quot; is weird. Under &quot;Finance&quot;, there&#x27;s a link to Google Finance page ( <a href="http:&#x2F;&#x2F;finance.google.com&#x2F;" rel="nofollow">http:&#x2F;&#x2F;finance.google.com&#x2F;</a> ). How is that a &quot;dataset&quot; ??
chestnut-tree超过 9 年前
For those in the UK, the available Government datasets are published on <a href="http:&#x2F;&#x2F;www.data.gov.uk" rel="nofollow">http:&#x2F;&#x2F;www.data.gov.uk</a><p>The datasets are not public domain, but licensed under the Open Government Licence (which allows you to use and adapt the data for commercial use).<p>There&#x27;s also the Global Open Data Index: a website that ranks countries by how much Government data is available as open datasets based on certain criteria. The current top spot is taken by Taiwan<p><pre><code> 1. Taiwan 2. UK 3. Denmark 4. Colombia 5. Finland 5. Australia 7. Uruguay 8. USA 8. Netherlands 10. Norway 10. France </code></pre> <a href="http:&#x2F;&#x2F;index.okfn.org&#x2F;place&#x2F;" rel="nofollow">http:&#x2F;&#x2F;index.okfn.org&#x2F;place&#x2F;</a>
评论 #11005988 未加载
clockwerx超过 9 年前
I wish linkeddata.org or ckan installs weren&#x27;t being reinvented here, but instead ckan supported pull requests or similar decentralized ways to publish new data sets
评论 #11010646 未加载
yzh超过 9 年前
For the complex network part, I think the collection missed this one: <a href="http:&#x2F;&#x2F;www.networkrepository.com&#x2F;" rel="nofollow">http:&#x2F;&#x2F;www.networkrepository.com&#x2F;</a> The site itself is a collection of several publicly available network datasets.
jack9超过 9 年前
I noticed no <a href="http:&#x2F;&#x2F;commoncrawl.org&#x2F;" rel="nofollow">http:&#x2F;&#x2F;commoncrawl.org&#x2F;</a> (oh no, naked domain!) or <a href="http:&#x2F;&#x2F;www.cochrane.org&#x2F;" rel="nofollow">http:&#x2F;&#x2F;www.cochrane.org&#x2F;</a><p>I don&#x27;t quite understand the criteria for being included in the list since I think it&#x27;s:<p><a href="https:&#x2F;&#x2F;groups.google.com&#x2F;forum&#x2F;#!forum&#x2F;awesomepublicdatasets" rel="nofollow">https:&#x2F;&#x2F;groups.google.com&#x2F;forum&#x2F;#!forum&#x2F;awesomepublicdataset...</a>
patrickk超过 9 年前
Betfair Historical Exchange Data requires you to have &quot;100 Betfair points&quot; which you acquire by gambling on their site. It&#x27;s hardly an open dataset.
Spooky23超过 9 年前
Check out data.ny.gov<p>Also nycopendata.socrata.com
lifeisstillgood超过 9 年前
Is it too late to create a central registry of datasets - to aid discoverability. A voluntary system maintained by convention?<p>Perhaps a distributed registration system ala DNS?
评论 #11005876 未加载
评论 #11011171 未加载
tylercubell超过 9 年前
Enigma.io is great for public data too.
评论 #11004991 未加载
legulere超过 9 年前
It&#x27;s strange that they put Wikidata under natural language.